Skip to yearly menu bar Skip to main content


Poster
in
Workshop: AI for New Drug Modalities

A Foundation Model for RNA Function and Structure Prediction

Shuxian Zou · Tianhua Tao · Sazan Mahbub · Caleb Ellington · Robin Algayres · Yonghao Zhuang · Hongyi Wang · Eric Xing · Le Song


Abstract:

Originally marginalized as an intermediate in the information flow from DNA to protein, RNA has become the star of modern biology, holding the key to precision therapeutics, genetic engineering, evolutionary origins, and our understanding of fundamental cellular processes. Yet RNA is as mysterious as it is prolific, serving as an information store, a messenger, and a catalyst, spanning many undercharacterized functional and structural classes. Deciphering the language of RNA is important not only for a mechanistic understanding of its biological functions but also for accelerating drug design. Toward this goal, we introduce rnaFoundation, an RNA foundation model (FM) at the scale of 1.6 billion parameters, trained on 42 million non-coding RNA (ncRNA) sequences at single-nucleotide resolution. rnaFoundation achieves state-of-the-art performance on a comprehensive set of tasks, including structure prediction, genetic regulation, molecular function across species, and RNA sequence design. rnaFoundation after domain adaptation learns to model essential parts of protein translation that protein language models, which have received widespread attention in recent years, do not. More broadly, rnaFoundation hints at the generality of biological sequence modeling and the ability to leverage information flow in the central dogma to improve many biomolecular representations simultaneously. We will opensource our model, data, and code to the community.

Chat is not available.