T-cell receptor specific protein language model for prediction and interpretation of epitope binding (ProtLM.TCR)

The cellular adaptive immune response relies on epitope recognition by T-cell receptors (TCRs). We used a language model for TCRs (ProtLM.TCR) to predict TCR-epitope binding. This model was pre-trained on a large set of TCR sequences (~62.106) before being fine-tuned to predict TCR-epitope bindings across multiple human leukocyte antigen (HLA) of class-I types. We then tested ProtLM.TCR on a balanced set of binders and non-binders for each epitope, avoiding model shortcuts like HLA categories. We compared pan-HLA versus HLA-specific models, and our results show that while computational prediction of novel TCR-epitope binding probability is feasible, more epitopes and diverse training datasets are required to achieve a better generalized performances in de novo epitope binding prediction tasks. We also show that ProtLM.TCR embeddings outperform BLOSUM scores and hand-crafted embeddings. Finally, we have used the LIME framework to examine the interpretability of these predictions.

[1]  Howard Y. Chang,et al.  TCR-BERT: learning the grammar of T-cell receptors for flexible antigen-xbinding analyses , 2021, bioRxiv.

[2]  Jannis Born,et al.  TITAN: T-cell receptor specificity prediction with bimodal attention networks , 2021, Bioinform..

[3]  Wout Bittremieux,et al.  Current challenges for unseen-epitope TCR interaction prediction and a new perspective derived from image classification , 2020, Briefings Bioinform..

[4]  Chloe H. Lee,et al.  Predicting Cross-Reactivity and Antigen Specificity of T Cell Receptors , 2020, Frontiers in Immunology.

[5]  Felix Mölder,et al.  Rapid T cell receptor interaction grouping with ting , 2020, bioRxiv.

[6]  Geir Kjetil Sandve,et al.  Modern Hopfield Networks and Attention for Immune Repertoire Classification , 2020, bioRxiv.

[7]  Morten Nielsen,et al.  T Cell Epitope Predictions. , 2020, Annual review of immunology.

[8]  Andrew K. Sewell,et al.  VDJdb in 2019: database extension, new analysis infrastructure and a T-cell receptor motif compendium , 2019, Nucleic Acids Res..

[9]  Wojciech Samek,et al.  UDSMProt: universal deep sequence models for protein classification , 2019, bioRxiv.

[10]  John Canny,et al.  Evaluating Protein Transfer Learning with TAPE , 2019, bioRxiv.

[11]  H. Yotsuyanagi,et al.  Quantitative Prediction of the Landscape of T Cell Epitope Immunogenicity in Sequence Space , 2019, Front. Immunol..

[12]  Alessandro Sette,et al.  The Immune Epitope Database (IEDB): 2018 update , 2018, Nucleic Acids Res..

[13]  Huanming Yang,et al.  PIRD: Pan immune repertoire database , 2018, bioRxiv.

[14]  Nicole L La Gruta,et al.  Understanding the drivers of MHC restriction of T cell receptors , 2018, Nature Reviews Immunology.

[15]  Alessandro Sette,et al.  ImmunomeBrowser: a tool to aggregate and visualize complex and heterogeneous epitopes in reference proteins , 2018, Bioinform..

[16]  Jaime Prilusky,et al.  McPAS‐TCR: a manually curated catalogue of pathology‐associated T cell receptor sequences , 2017, Bioinform..

[17]  William S. DeWitt,et al.  Immunosequencing identifies signatures of cytomegalovirus exposure history and HLA-mediated effects on the T cell repertoire , 2017, Nature Genetics.

[18]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[19]  James McCluskey,et al.  T cell antigen receptor recognition of antigen-presenting molecules. , 2015, Annual review of immunology.

[20]  P. Dash,et al.  The Public Face and Private Lives of T Cell Receptor Repertoires , 2021 .

[21]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..