Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Machine Learning in Structural Biology

Protein Language Model Fitness is a Matter of Preference

Cade Gordon · Amy Lu · Pieter Abbeel

[ ]
 
presentation: Machine Learning in Structural Biology
Sun 15 Dec 8:30 a.m. PST — 5 p.m. PST

Abstract:

Although protein language models (pLMs) have been used to successfully design proteins for therapeutic and research purposes, it remains unclear under what conditions they will succeed or fail. We show that pLM likelihoods indicate zero-shot fitness prediction capabilities. To determine what data causes a sequence to be likely, we utilize influence functions finding that homologous neighbors from proteins search are responsible for increasing sequence likelihood most. We use this to motivate finetuning on sequences with only low likelihoods to improve the performance on selecting beneficial mutations, thus improving protein engineering capabilities.

Chat is not available.