Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Foundation Models for Science: Progress, Opportunities, and Challenges

Understanding Protein-DNA Interactions by Paying Attention to Protein and Genomics Foundation Models

Dhruva Rajwade · Erica Wang · Aryan Satpathy · Alexander Brace · Hongyu Guo · Arvind Ramanathan · Shengchao Liu · Animashree Anandkumar

Keywords: [ Deep Learning ] [ Genomic Foundation models ] [ Structural Biology ] [ Protein ] [ DNA ]


Abstract:

Protein-nucleic acid (NA) interactions are key in controlling gene regulation. There lies a strong motivation in understanding these interactions, with a goal of engineering these interactions to solve biological problems. Current methods to quantify protein-nucleic acids are mainly experimental and require much time and money. To mitigate this, Deep learning methods have recently been applied to predict Protein-DNA contacts. Although promising, these methods are computationally expensive and face challenges in accuracy. To address these challenges, we propose Seq2Contact, a novel method to predict the protein-NA binding at a single nucleotide (DNA) and single amino acid (Protein) level. Seq2Contact is built on protein and DNA foundation models to obtain nucleotide and amino acid-specific embeddings and then introduces a cross-attention module to obtain the binding contact maps. We employ a sequence-similarity based clustering method to split the train-test data and empirically illustrate that Seq2Contact can achieve state-of-the-art performance, beating existing baselines by almost 20\% (F1-Score) for Protein-NA binding prediction. Our method is computationally more efficient, with up to 80\% less memory cost and more than 90\% less inference time. Code is available at https://anonymous.4open.science/r/Protein-DNA-Attention-8602/README.md.

Chat is not available.