Poster
in
Workshop: Generative AI and Biology (GenBio@NeurIPS2023)
Conditional Generation of Antigen Specific T-cell Receptor Sequences
Dhuvarakesh Karthikeyan · Colin Raffel · Benjamin Vincent · Alex Rubinsteyn
Keywords: [ Seq2Seq ] [ Many to Many ] [ Immunology ] [ Large language models ]
Training and evaluation of large language models (LLMs) for use in designing antigen specific T-cell receptor (TCR) sequences is challenging due to the complex many-to-many mapping between TCRs and their targets, which is exacerbated by a severe lack of ground truth data. Traditional NLP metrics can be artificially poor indicators of model performance since labels are concentrated on a few examples, and functional in-vitro assessment of generated TCRs is time-consuming and costly. Here, we introduce TCR-BART and TCR-T5, adapted from the prominent BART and T5 models, to explore the use of these LLMs for conditional TCR sequence generation given a specific epitope of interest. To fairly evaluate such models with limited labeled examples, we propose novel evaluation metrics tailored to the sparsely sampled many-to-many nature of TCR-epitope data and investigate the interplay between accuracy and diversity of generated TCR sequences.