Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Machine Learning in Structural Biology Workshop

Evaluating Representation Learning on the Protein Structure Universe

Arian Jamasb · Alex Morehead · Zuobai Zhang · Chaitanya K. Joshi · Kieran Didi · Simon Mathis · Charles Harris · Jian Tang · Jianlin Cheng · Pietro LiĆ³ · Tom Blundell


Abstract:

We introduce ProteinWorkshop, a comprehensive and rigorous benchmark suite for evaluating protein structure representation learning methods. We provide large-scale pretraining and downstream tasks comprised of both experimental and predicted structures, offering a balanced challenge to representation learning algorithms. We demonstrate the utility of our benchmark by systematically evaluating state-of-the-art protein-specific and generic geometric Graph Neural Networks and the extent to which they benefit from pretraining. We find that: (1) pretraining consistently improves the performance of both rotation-invariant and equivariant geometric models; (2) equivariant models seem to benefit more from pretraining compared to invariant models. Our open-source codebase reduces the barrier to entry for working with large structure-based datasets by providing utilities for constructing new tasks directly from the entire PDB, as well as storage-efficient dataloaders from large-scale predicted structures including AlphaFoldDB and ESM Atlas. ProteinWorkshop is available at: https://anonymous.4open.science/r/ProteinWorkshop-B8F5.

Chat is not available.