Poster
in
Affinity Event: LatinX in AI
Identification of Antigen Specific B-cell receptors from single-cell V(D)J sequences using a Large Language Model, ASPred
Karen Paco · Zihao Zhang · Mariana Paco Mendivil · Peace Olatoyinbo · Sanaz Zebardast · Dhruv Patel · Isabel Condori · Jordan Lay · Karine Le Roch · Tristan Yang · Jonathan Felix · Jeniffer Hernandez · Matthew Sazinsky · Ilya Tolstorukov · Stefano Lonardi · Animesh Ray
The rapid sequencing of antibody genes has accelerated vaccine development. Despite these advances, predicting synthetic antibodies capable of binding and neutralizing novel antigens remains a complex challenge. Much of this challenge is due to a limited understanding of the rules of protein-protein interaction at the surface of an antigen to which its cognate antibody protein binds. While recent advances in single-cell sequencing of antibody-producing B-cells have improved precision in mapping B-cell receptors (or BCRs, which are the membrane-bound forms of the antibody produced by those B-cells) to their cognate antigens, there remain additional challenges. Here we demonstrate that known sequences of antigen-BCR pairs can be used to train a Large Language Model (LLM) to predict antigen-specific BCRs from the total BCR repertoire of immunized mice, which we call Antibody Specificity Predictor (ASPred). By leveraging the pattern recognition capabilities of LLMs we classified B-cell receptor sequences from mice immunized with a challenge antigen, predicting antigen-specific BCRs without the need for preselecting B cells based on antigen binding. These results suggest sufficient information exists in BCR-antigen sequence pairs for LLMs to reliably predict antigen-antibody interaction specificity, potentially opening new avenues for the computational design of synthetic antibodies, with broad implications for vaccine development and therapeutic discovery.