Skip to yearly menu bar Skip to main content


Poster
in
Workshop: AI for New Drug Modalities

Latent Diffusion Models for Controllable RNA Sequence Generation

Kaixuan Huang · Yukang Yang · Kaidi Fu · Yanyi Chu · Le Cong · Mengdi Wang


Abstract:

This work presents RNADiffusion, a latent diffusion model for generating and optimizing discrete RNA sequences of variable lengths. RNA is a key intermediary between DNA and protein, exhibiting high sequence diversity and complex three-dimensional structures to support a wide range of functions. We utilize pretrained BERT-type models to encode raw RNA sequences into token-level, biologically meaningful representations. A Query Transformer is employed to compress such representations into a set of fixed-length latent vectors, with an autoregressive decoder trained to reconstruct RNA sequences from these latent variables. We then develop a continuous diffusion model within this latent space. To enable optimization, we integrate the gradients of reward models—surrogates for RNA functional properties—into the backward diffusion process, thereby generating RNAs with high reward scores. Empirical results confirm that RNADiffusion generates non-coding RNAs that align with natural distributions across various biological metrics. Further, fine-tuning on mRNA 5’ untranslated regions (5’-UTRs) optimizes generated sequences for high translation efficiency. Our guided diffusion model effectively generates diverse 5’-UTRs with high Mean Ribosome Loading (MRL) and Translation Efficiency (TE), outperforming baselines in balancing rewards and structural stability trade-offs. These findings hold potential for advancing RNA sequence-function research and therapeutic RNA design.

Chat is not available.