Poster
in
Workshop: Machine Learning in Structural Biology
Functional Alignment of Protein Language Models via Reinforcement Learning with Experimental Feedback
Nathaniel Blalock · Srinath Seshadri · Philip Romero
Protein language models (pLMs) have emerged as state-of-the-art tools for generative protein sequence design. pLMs however do not inherently design new sequences with function beyond what occurs in nature, demonstrating a misalignment with the protein engineering objective of redesigning a protein sequence with enhanced function. In the field of natural language processing, Reinforcement Learning with Human Feedback (RLHF) aligned the large language model ChatGPT towards preferred responses via supervised fine-tuning (SFT) and proximal policy optimization (PPO). We adapt SFT and PPO for the functional alignment of pLMs using experimental data and call this method Reinforcement Learning with Experimental Feedback (RLXF). We use RLXF to align ESM-2 and a generative variational autoencoder to design 5 mutant variants of the oxygen-independent fluorescent protein CreiLOV. We find a greater fraction of designs from aligned models were active and brighter than CreiLOV with in vivo fluorescence assays. Notably, aligned ESM-2 designed the brightest sequence with approximately a 16-fold increase in fluorescence intensity. We present RLXF as a versatile method to functionally align pLMs using experimental data to achieve state-of-the-art protein sequence redesign.