Poster
in
Workshop: Foundation Models for Science: Progress, Opportunities, and Challenges
Language Models for Text-guided Protein Evolution
Zhanghan Ni · Shengchao Liu · Animashree Anandkumar
Keywords: [ Large Language Models ] [ Protein Representation Learning ] [ Protein Design ] [ Multimodal Learning ] [ Protein Evolution ]
Language models have demonstrated efficacy in protein design by capturing the distribution of amino acid residues. To advance protein representation learning, biomedical textual information has been integrated as an additional modality, complementing existing sequence and structure modalities. The textual modality is crucial as it provides insights into detailed molecular functions and cellular contexts in which proteins function. Incorporating this modality aligns with the objectives of natural protein evolution, which seeks to optimize functional attributes for improved environmental fitness. Consequently, leveraging the reasoning capabilities of large language models facilitates more informed choices in evolution-based protein design. In this study, we evaluate existing language models on two novel protein evolution tasks: text-guided point mutation and text-guided EC classification switching. Our findings reveal that models capable of conditioning on free-form text can more effectively design enzyme functions compared to those limited to functional keyword annotations. Specifically, incorporating evolutionary context into enzyme function editing achieves a 30.06\% closer alignment to the desired function.