Poster
in
Workshop: Machine Learning in Structural Biology
Conditional Enzyme Generation Using Protein Language Models with Adapters
Jason Yang · Aadyot Bhatnagar · Jeffrey Ruffolo · Ali Madani
The conditional generation of proteins with desired functions and/or properties is a key goal for generative models. Existing methods based on approaches related to prompting of language models can generate proteins conditioned on target functionality, such as desired enzyme family. However, these methods are limited to simple, tokenized conditioning and have not been shown to generalize to functions that lie out-of-distribution. In this study, we propose ProCALM (Protein Conditionally Adapted Language Model), an approach for the conditional generation of enzymes using adapters to protein language models. Our specific implementation of ProCALM involves finetuning ProGen2 to incorporate conditioning representations of enzyme function and taxonomy. ProCALM matches existing methods at conditionally generating sequences from target enzyme families. Impressively, it can also generate within the joint distribution of enzymatic function and taxonomy and can generalize to rare and unseen enzyme families and taxonomies. Overall, ProCALM is a flexible and computationally efficient approach, and we expect that it can be extended to a wide range of generative language models.