Poster
in
Workshop: Machine Learning in Structural Biology Workshop
T-cell receptor specific protein language model for prediction and interpretation of epitope binding (ProtLM.TCR)
Ahmed Essaghir
The cellular adaptive immune response relies on epitope recognition by T-cell receptors (TCRs). We used a language model for TCRs (ProtLM.TCR) to predict TCR-epitope binding. This model was pre-trained on a large set of TCR sequences before being fine-tuned to predict TCR-epitope bindings across multiple human leukocyte antigen (HLA) of class-I types. We then tested ProtLM.TCR on a balanced set of binders and non-binders for each epitope, avoiding model shortcuts like HLA categories. We compared pan-HLA versus HLA-specific models, and our results show that while computational prediction of novel TCR-epitope binding probability is feasible, more diverse datasets are required to achieve a more generalized performance towards de novo epitope binding predictions. We also show that ProtLM.TCR embeddings outperform BLOSUM and hand-crafted embeddings. Finally, we have used the LIME framework to examine the interpretability of these predictions.