Skip to yearly menu bar Skip to main content


Poster

ProtGO: Function-Guided Protein Modeling for Unified Representation Learning

Bozhen Hu · Cheng Tan · Yongjie Xu · Zhangyang Gao · Jun Xia · Lirong Wu · Stan Z. Li

[ ]
Wed 11 Dec 4:30 p.m. PST — 7:30 p.m. PST

Abstract:

Protein representation learning is indispensable for various downstream applications, such as drug design and structure prediction. However, achieving effective representation learning for proteins poses challenges due to the diverse data modalities involved, including sequence, structure, domains, motifs, and function annotations. Despite the impressive capabilities of large language models in modelling biomedical texts, there remains a pressing need for a framework that seamlessly integrates these diverse modalities, particularly focusing on the three critical aspects of protein information: sequence, structure, and function. Moreover, addressing the inherent data scale differences among these modalities is essential. To tackle these challenges, we introduce ProtGO, a unified model that harnesses a teacher network equipped with a customized graph neural network (GNN) and a Gene Ontology (GO) encoder to learn hybrid embeddings. Notably, our approach eliminates the need for additional functions as input for the student network, which shares the same GNN module. Importantly, we utilize a domain adaptation method to facilitate distribution approximation for guiding the training of the teacher-student framework. This approach leverages distributions learned from latent representations to avoid the alignment of individual samples. Benchmark experiments highlight that ProtGO significantly outperforms state-of-the-art baselines, emphasizing the advantages of the proposed unified framework.

Live content is unavailable. Log in and register to view live content