Poster
in
Workshop: Learning Meaningful Representations of Life
Tuned Quadratic Landscapes for Benchmarking Model-Guided Protein Design
Neil Thomas · Atish Agarwala · David Belanger · Yun Song · Lucy Colwell
Advancements in DNA synthesis and sequencing technologies have enabled a novel paradigm of protein design where machine learning models trained on experimental data are used to guide exploration of a protein sequence landscape. ML-guided directed evolution (MLDE) has the potential to not only build upon the successes of directed evolution, but to also unlock new strategies that can make more efficient use of experimental data, and trade off between multiple optimization objectives. Building an MLDE pipeline involves manifold design choices ranging from data collection strategies to modeling choices, each of which has a large impact on the downstream effectiveness of designed sequences. The cost of collecting experimental data makes benchmarking these pipelines on real data prohibitively difficult, necessitating the development of synthetic landscapes where MLDE strategies can be tested. In this work, we develop a framework called SLIP (“Synthetic Landscape Inference for Proteins”) for constructing synthetic landscapes with tunable difficulty based on Potts Models. SLIP is open-source.