Skip to yearly menu bar Skip to main content


Poster

MSA Generation with Seqs2Seqs Pretraining: Advancing Protein Structure Predictions

LE ZHANG · Jiayang Chen · Tao Shen · Yu Li · Siqi Sun

East Exhibit Hall A-C #1107
[ ]
Wed 11 Dec 4:30 p.m. PST — 7:30 p.m. PST

Abstract:

Deep learning models like AlphaFold2 have revolutionized protein structure prediction, achieving unprecedented accuracy. However, the dependence on robust multiple sequence alignments (MSAs) continues to pose a challenge, especially for proteins that lack a wealth of homologous sequences. To overcome this limitation, we introduce MSA-Generator, a self-supervised generative protein language model. Trained on a sequence-to-sequence task using an automatically constructed dataset, MSA-Generator employs protein-specific attention mechanisms to harness large-scale protein databases, generating virtual MSAs that enrich existing ones and boost prediction accuracy. Our experiments on CASP14 and CASP15 benchmarks reveal significant improvements in LDDT scores, particularly for complex and challenging sequences, enhancing the performance of both AlphaFold2 and RoseTTAFold. The code is released at \url{https://github.com/lezhang7/MSAGen}.

Chat is not available.