论文信息 - T-cell receptor specific protein language model for prediction and interpretation of epitope binding (ProtLM.TCR) - 字舞流文

T-cell receptor specific protein language model for prediction and interpretation of epitope binding (ProtLM.TCR)

The cellular adaptive immune response relies on epitope recognition by T-cell receptors (TCRs). We used a language model for TCRs (ProtLM.TCR) to predict TCR-epitope binding. This model was pre-trained on a large set of TCR sequences (~62.106) before being fine-tuned to predict TCR-epitope bindings across multiple human leukocyte antigen (HLA) of class-I types. We then tested ProtLM.TCR on a balanced set of binders and non-binders for each epitope, avoiding model shortcuts like HLA categories. We compared pan-HLA versus HLA-specific models, and our results show that while computational prediction of novel TCR-epitope binding probability is feasible, more epitopes and diverse training datasets are required to achieve a better generalized performances in de novo epitope binding prediction tasks. We also show that ProtLM.TCR embeddings outperform BLOSUM scores and hand-crafted embeddings. Finally, we have used the LIME framework to examine the interpretability of these predictions.

A. Essaghir | S. Phogat | Shruti Kapil | Gurpreet Singh | Anjana Singh | Nanda Kumar Sathiyamoorthy | P. Smyth | Adrian Postelnicu | S. Ghiviriga | Alexandru Ghita | Ahmed Essaghir

[1] Howard Y. Chang,et al. TCR-BERT: learning the grammar of T-cell receptors for flexible antigen-xbinding analyses , 2021, bioRxiv.

[2] Jannis Born,et al. TITAN: T-cell receptor specificity prediction with bimodal attention networks , 2021, Bioinform..

[3] Wout Bittremieux,et al. Current challenges for unseen-epitope TCR interaction prediction and a new perspective derived from image classification , 2020, Briefings Bioinform..

[4] Chloe H. Lee,et al. Predicting Cross-Reactivity and Antigen Specificity of T Cell Receptors , 2020, Frontiers in Immunology.

[5] Felix Mölder,et al. Rapid T cell receptor interaction grouping with ting , 2020, bioRxiv.

[6] Geir Kjetil Sandve,et al. Modern Hopfield Networks and Attention for Immune Repertoire Classification , 2020, bioRxiv.

[7] Morten Nielsen,et al. T Cell Epitope Predictions. , 2020, Annual review of immunology.

[8] Andrew K. Sewell,et al. VDJdb in 2019: database extension, new analysis infrastructure and a T-cell receptor motif compendium , 2019, Nucleic Acids Res..

[9] Wojciech Samek,et al. UDSMProt: universal deep sequence models for protein classification , 2019, bioRxiv.

[10] John Canny,et al. Evaluating Protein Transfer Learning with TAPE , 2019, bioRxiv.

[11] H. Yotsuyanagi,et al. Quantitative Prediction of the Landscape of T Cell Epitope Immunogenicity in Sequence Space , 2019, Front. Immunol..

[12] Alessandro Sette,et al. The Immune Epitope Database (IEDB): 2018 update , 2018, Nucleic Acids Res..

[13] Huanming Yang,et al. PIRD: Pan immune repertoire database , 2018, bioRxiv.

[14] Nicole L La Gruta,et al. Understanding the drivers of MHC restriction of T cell receptors , 2018, Nature Reviews Immunology.

[15] Alessandro Sette,et al. ImmunomeBrowser: a tool to aggregate and visualize complex and heterogeneous epitopes in reference proteins , 2018, Bioinform..

[16] Jaime Prilusky,et al. McPAS‐TCR: a manually curated catalogue of pathology‐associated T cell receptor sequences , 2017, Bioinform..

[17] William S. DeWitt,et al. Immunosequencing identifies signatures of cytomegalovirus exposure history and HLA-mediated effects on the T cell repertoire , 2017, Nature Genetics.

[18] Carlos Guestrin,et al. "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[19] James McCluskey,et al. T cell antigen receptor recognition of antigen-presenting molecules. , 2015, Annual review of immunology.

[20] P. Dash,et al. The Public Face and Private Lives of T Cell Receptor Repertoires , 2021 .

[21] T. N. Bhat,et al. The Protein Data Bank , 2000, Nucleic Acids Res..