Poster
in
Workshop: Machine Learning in Structural Biology
Fine-Tuning Discrete Diffusion Models via Reward Optimization: Applications to DNA and Protein Design
Chenyu Wang · Masatoshi Uehara · Yichun He · Amy Wang · Tommaso Biancalani · Avantika Lal · Tommi Jaakkola · Sergey Levine · Hanchen Wang · Aviv Regev
Recent studies have demonstrated the strong empirical performance of diffusion models on discrete sequences (i.e., discrete diffusion models) across domains such as natural language and biological sequence generation. However, practical tasks often require optimizing specific objectives in addition to modeling the conditional distribution, such as protein stability in inverse folding. To address this, we consider pre-trained discrete diffusion models to generate ``natural'' sequences, and reward models for mapping sequences to objectives. We then frame the reward maximization problem as reinforcement learning (RL) while minimizing the KL divergence against pre-trained models to maintain sequence naturalness. To solve this, we propose a novel algorithm that enables direct reward backpropagation through entire trajectories, by making the non-differentiable trajectories differentiable using the Gumbel-Softmax trick. Our theoretical analysis and empirical results on DNA and protein design show the effectiveness of this approach.