Poster
in
Workshop: Machine Learning in Structural Biology
Adapting protein language models for rapid DTI prediction
Samuel Sledzieski · Rohit Singh · Lenore J Cowen · Bonnie Berger
We consider the problem of sequence-based drug-target interaction (DTI) prediction, showing that a straightforward deep learning architecture that leverages an appropriately pre-trained protein embedding outperforms state of the art approaches, achieving higher accuracy and an order of magnitude faster training. The protein embedding we use is constructed from language models, trained first on the entire corpus on protein sequences and then on the corpus of protein-protein interactions. This multi-tier pre-training customizes the embedding with implicit protein structure and binding information that is especially useful in few-shot (small training data set) and zero-shot instances (unseen proteins or drugs) and can be extended with additional neural network layers when the training data size allows for greater model complexity. We anticipate such transfer learning approaches will facilitate rapid prototyping of DTI models, especially in low-N scenarios.