Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Learning Meaningful Representations of Life

TranceptEVE: Combining Family-specific and Family-agnostic Models of Protein Sequences for Improved Fitness Prediction

Pascal Notin · Lodevicus van Niekerk · Aaron Kollasch · Daniel Ritter · Yarin Gal · Debora Marks


Abstract:

Successful approaches that model the fitness landscape of protein sequences have typically relied on family-specific sets of homologous sequences called multiple-sequence alignments (Hopf et al. 2017; Riesselman et al. 2018; Frazer et al. 2021). They are however limited by the fact many proteins are difficult to align or have shallow alignments. Newer models such as transformers that do not rely on alignments have been promising (Madani et al. 2020; Rives et al. 2021; Notin et al. 2022; Hesselow et al. 2022) to progressively bridge the gap with their alignment-based counterparts. In this work, we introduce TranceptEVE -- a hybrid between family-specific and family-agnostic models that seeks to build on the relative strengths from each approach to achieve state-of-the-art performance on the fitness prediction task. We demonstrate that it outperforms all other baselines on the recently released ProteinGym benchmarks (Notin et al. 2022) -- a curated set of 94 deep mutational scanning assays to assess the effects of substitution and indel mutations. We also quantify its ability to predict the pathogenicity of genetic mutations in humans based on annotations from ClinVar.

Chat is not available.