Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Learning Meaningful Representations of Life

Tuned Quadratic Landscapes for Benchmarking Model-Guided Protein Design

Neil Thomas · Atish Agarwala · David Belanger · Yun Song · Lucy Colwell


Abstract:

Advancements in DNA synthesis and sequencing technologies have enabled a novel paradigm of protein design where machine learning models trained on experimental data are used to guide exploration of a protein sequence landscape. ML-guided directed evolution (MLDE) has the potential to not only build upon the successes of directed evolution, but to also unlock new strategies that can make more efficient use of experimental data, and trade off between multiple optimization objectives. Building an MLDE pipeline involves manifold design choices ranging from data collection strategies to modeling choices, each of which has a large impact on the downstream effectiveness of designed sequences. The cost of collecting experimental data makes benchmarking these pipelines on real data prohibitively difficult, necessitating the development of synthetic landscapes where MLDE strategies can be tested. In this work, we develop a framework called SLIP (“Synthetic Landscape Inference for Proteins”) for constructing synthetic landscapes with tunable difficulty based on Potts Models. SLIP is open-source.

Chat is not available.