Poster
in
Workshop: Learning Meaningful Representations of Life
Designing and Evolving Neuron-Specific Proteases
Han Spinner · Colin Hemez · Julia McCreary · David Liu · Debora Marks
Directed evolution has remarkably advanced protein engineering. However, these experiments are typically seeded with a single sequence, and they are limited by the amount of sequence space they can explore. Here, we aim to develop a machine learning method that learns from the natural distribution of sequences to design diverse seed sequences. We use Botulinum Neurotoxin X (BoNT/X) as a proof of concept for this approach since there is published data on this evolution campaign, and there are many therapeutic applications of neuron-specific proteases. Additionally, BoNT/X is especially promising for this approach since related BoNT proteases have specific substrate specificity, limiting the utility of simply drawing from the natural sequences. We hypothesize that our machine learning model can learn the ‘essence’ of the protein family and generate diverse substrate binding domains. We built an alignment of 452 sequences around BoNT/X and show that models trained on this data can separate known beneficial and deleterious mutations. Next, we will use these models to generate sequences and perform new evolution experiments. Finally, we will evaluate the impact of starting with a diverse set of seed sequences versus only one seed sequence. This work will not only create new proteases that can be used for therapeutic indications, but also puts forth a new approach for machine-learning-guided evolution experiments.