Poster
in
Workshop: Machine Learning in Structural Biology Workshop
Protein generation with evolutionary diffusion: sequence is all you need
Sarah Alamdari · Nitya Thakkar · Rianne van den Berg · Alex X Lu · Nicolo Fusi · Ava Amini · Kevin Yang
Diffusion models have demonstrated the ability to generate biologically plausible proteins that are dissimilar to any proteins seen in nature, enabling unprecedented capability and control in de novo protein design. However, current state-of-the-art diffusion models generate protein structures, which limits the scope of their training data and restricts generations to a small and biased subset of protein space. We introduce a general-purpose diffusion framework, EvoDiff, that combines evolutionary-scale data with the distinct conditioning capabilities of diffusion models for controllable protein generation in sequence space. EvoDiff generates high-fidelity, diverse, and structurally-plausible proteins that cover natural sequence and functional space. Critically, EvoDiff can generate proteins inaccessible to structure-based models, such as those with disordered regions, and design scaffolds for functional structural motifs, demonstrating the universality of our sequence-based formulation. We envision that EvoDiff will expand capabilities in protein engineering beyond the structure-function paradigm toward programmable, sequence-first design. All code and models will be open-source.