Poster
in
Workshop: Machine Learning in Structural Biology Workshop
Large-scale self-supervised pre-training on protein three-dimensional structures
Ilya Senatorov
Recent developments in the protein structure prediction field led to a drastic increase in the number of available protein three-dimensional structures. This creates a challenge and presents an opportunity for discovering fitting approaches to utilise such new datasets in various machine learning settings. In this paper, we propose STEP (STructural Embedding of Proteins) a self-supervised learning approach for creating meaningful embeddings of protein structures and demonstrate its utility in a variety of downstream tasks. We study various approaches to such a problem, including deep metric learning, as well assimple label prediction tasks. We demonstrate the superiority of STEP over existing models in a variety of downstream tasks, including the prediction of drug-target interactions. We show that for especially challenging tasks, such as predicting drugs for new proteins, our model shows improvement of up to 0.1 AUROC over previous methods.