Poster
in
Workshop: Machine Learning in Structural Biology
Understanding Protein-DNA Interactions by Paying Attention to Protein and Genomics Foundation Models
Dhruva Rajwade · Erica Wang · Aryan Satpathy · Alexander Brace · Hongyu Guo · Arvind Ramanathan · Shengchao Liu · Animashree Anandkumar
Protein-nucleic acid (NA) interactions are key in controlling gene regulation. There lies a strong motivation in understanding these interactions, with a goal of engineering these interactions to solve biological problems. Current methods to quantify protein-nucleic acids are mainly experimental and require much time and money. To mitigate this, Deep learning methods have recently been applied to predict Protein-DNA contacts. Although promising, these methods are computationally expensive and face challenges in accuracy. To address these challenges, we propose Seq2Contact, a novel method to predict the protein-NA binding at a single nucleotide (DNA) and single amino acid (Protein) level. Seq2Contact is built on protein and DNA foundation models to obtain nucleotide and amino acid-specific embeddings and then introduces a cross-attention module to obtain the binding contact maps. We employ a sequence-similarity based clustering method to split the train-test data and empirically illustrate that Seq2Contact can achieve state-of-the-art performance, beating existing baselines by almost 20% (F1-Score) for Protein-DNA binding prediction. Our method is computationally more efficient, with up to 80% less memory cost and more than 90% less inference time. Code is available at ( https://anonymous.4open.science/r/Protein-DNA-Attention-8602/README.md )