Poster
in
Workshop: 5th Workshop on Self-Supervised Learning: Theory and Practice
EmbedSimScore: Advancing Protein Similarity Analysis with Structural and Contextual Embeddings
Gourab Saha · Toki Tahmid · Md. Shamsuzzoha Bayzid
Accurately computing protein similarity is challenging due to the intricate interplay between local substructures and global structure within protein molecules. Traditional metrics like TM-score often focus on aligning the global structures of the proteins in a rather algorithmic way, potentially overlooking critical local-global relations and contextual comparisons. We introduce EmbedSimScore, a novel self-supervised method that generates superior structural and contextual embeddings by jointly considering both local substructures and global structures of proteins. Utilizing contrastive language-structure pre-training (CLSP) and structural contrastive learning, EmbedSimScore captures comprehensive features across different scales of protein structure. These embeddings provide a more precise and holistic means of computing protein similarities, resulting in the identification of intrinsic relations among proteins that traditional approaches overlook.