Poster
in
Workshop: Machine Learning in Structural Biology
Allo-Allo: Data-efficient prediction of allosteric sites
Tianze Dong · Christopher Kan · Kapil Devkota · Rohit Singh
Allostery, a fundamental structural mechanism where ligand binding at a protein site affects protein function at another site, plays a crucial role in key drug-target proteins like GPCRs. Unfortunately, existing methods for predicting allosteric sites have limited performance-- they are particularly constrained by scarce ground-truth experimental data. We introduce Allo-Allo, a data-efficient, sequence-based method that predicts allosteric sites by leveraging protein language models (PLMs). Honing in on ESM-2 attention heads that capture allosteric residue associations, Allo-Allo achieves a 67\% higher AUPRC than state-of-the-art methods. Our innovative, data-efficient pipeline not only outperforms alternate, commonly-used PLM-based prediction architectures but also generalizes well. Notably, mutations in Allo-Allo-predicted sites show significant association with elevated disease risk scores from AlphaMissense, highlighting its translational potential. Beyond Allo-Allo's biological and translational applicability, its architecture presents a powerful framework for other data-scarce problems in protein analysis.