Poster
in
Workshop: Generative AI and Biology (GenBio@NeurIPS2023)
ProteinRL: Reinforcement learning with generative protein language models for property-directed sequence design
Matt Sternke · Joel Karpiak
Keywords: [ protein language models ] [ Reinforcement Learning ] [ Optimization ]
Abstract:
The overarching goal of protein engineering is the design and optimization of proteins customized for specific purposes. Generative protein language models (PLMs) allow for $\textit{de novo}$ protein sequence generation, however current PLMs lack capabilities for controllable sequence generation of sequences tailored with desired properties. Here we present ProteinRL a flexible, data-driven reinforcement learning framework for fine-tuning generative PLMs for the $\textit{de novo}$ design of sequences optimized for specific sequence and/or structural properties. We highlight two examples cases of realistic protein design goals: a single-objective design for sequences containing unusually high charge content, and a multi-objective design scenario of a hit expansion, diversifying a target sequence with generated sequences having high-confidence structure predictions and high probability predictions of soluble expression. In both cases ProteinRL fine-tuning guides the PLM towards generating sequences optimized for the defined properties, extending to values rarely or never seen in natural sequences or sequences generated without ProteinRL fine-tuning. The demonstrated success and adaptability of the ProteinRL framework allows for the $\textit{de novo}$ design of novel protein sequences optimized for applications across many areas of protein engineering.
Chat is not available.