Poster
in
Workshop: Generative AI and Biology (GenBio@NeurIPS2023)
DSMBind: an unsupervised generative modeling framework for binding energy prediction
Wengong Jin · Caroline Uhler · Nir HaCohen
Keywords: [ denoising score matching ] [ protein-protein binding ] [ antibody-antigen binding ] [ protein-ligand binding ] [ Energy-Based Models ]
Predicting the binding between proteins and other molecules is a core question in biology. Geometric deep learning is a promising paradigm for protein-ligand or protein-protein binding energy prediction, but its accuracy is limited by the size of training data as high-throughput binding assays are expensive. Unsupervised learning, such as protein language models, is particularly useful in this setting because it does not need experimental binding energy data for training. In this work, we propose DSMBind, a new generative modeling framework for protein complex structures, and show that the likelihood of crystal structures are highly correlated with their binding energy. Specifically, DSMBind learns an energy-based model from a training set of unlabeled crystal structures via SE(3) denoising score matching (DSM), where we perturb a protein complex via random rotation of backbone and side-chains. We find the learned energy is highly correlated with experimental binding affinity across multiple benchmarks, including protein-ligand binding, antibody-antigen binding, and protein-protein binding mutation effect prediction. DSMBind not only outperforms unsupervised learning methods based on protein language models or inverse folding, but also matches the performance of state-of-the-art supervised models trained on experimental binding data.