Poster
in
Workshop: Machine Learning in Structural Biology Workshop
Jointly Embedding Protein Structures and Sequences through Residue Level Alignment
Foster Birnbaum · Saachi Jain · Amy Keating · Aleksander Madry
The relationships between protein sequences, structures, and their functions are determined by complex codes that scientists aim to decipher. In particular, while structures contain key information about the protein's biochemical functions, they are often experimentally difficult to obtain. In contrast, protein sequences are abundant but are a step removed from molecular function. In this paper, we propose Residue Level Alignment (RLA) — a self-supervised objective for aligning structure and sequence embedding spaces. By situating structure and sequence encoders within the same latent space, RLA allows the structure encoder to leverage large sequence databases and enriches the sequence encoder with spatial information. Moreover, our framework enables us to measure the similarity between a structure and sequence by comparing their RLA embeddings: we show how RLA similarity scores can be used for binder design by screening for appropriate docking candidates for a given protein-protein or protein-peptide interaction.