SETE: Sequence-based Ensemble learning approach for TCR Epitope binding prediction

Predicting the binding of T cell receptors (TCRs) to epitopes plays a vital role in the immunotherapy, because it guides the development of therapeutic vaccines and cancer treatments. Many prediction methods attempted to explain the relationship between TCR repertoires from different aspects such as the V(D)J gene locus and the biophysical features of amino acids molecules, but the extraction of these features is time consuming and the performance of these models are limited. Few studies have investigated how k-mers formed by adjacent amino acids in TCR sequences direct the epitope recognition, and the specific mechanism of TCR epitope binding is still unclear. Motivated by these, we presented SETE (Sequence-based Ensemble learning approach for TCR Epitope binding prediction), a novel model to predict the TCR epitope binding accurately. The model deconstructed the CDR3β sequence to short amino acid chains as features and learned the pattern of them between different TCR repertoires with gradient boosting decision tree algorithm. Experiments have demonstrated that SETE can be helpful in predicting the TCRs' corresponding epitopes and it outperforms other state-of-the-art methods in predicting the epitope specificity of TCR on VDJdb data set. The source codes have been uploaded at https://github.com/wonanut/SETE for academic usage only.

[1]  O. Lund,et al.  novel sequence representations Reliable prediction of T-cell epitopes using neural networks with , 2003 .

[2]  William S. DeWitt,et al.  Human T cell receptor occurrence patterns encode immune history, genetic background, and receptor specificity , 2018, bioRxiv.

[3]  J. Wolchok,et al.  Novel cancer immunotherapy agents with survival benefit: recent successes and next steps , 2011, Nature Reviews Cancer.

[4]  Kris Laukens,et al.  On the viability of unsupervised T-cell receptor sequence clustering for epitope preference , 2018, Bioinform..

[5]  J. Cabaniols,et al.  Most α/β T Cell Receptor Diversity Is Due to Terminal Deoxynucleotidyl Transferase , 2001, The Journal of experimental medicine.

[6]  Yuval Elhanati,et al.  Predicting the spectrum of TCR repertoire sharing with a data‐driven model of recombination , 2018, bioRxiv.

[7]  Wout Bittremieux,et al.  On the feasibility of mining CD8+ T cell receptor patterns underlying immunogenic peptide recognition , 2017, Immunogenetics.

[8]  O. Lund,et al.  Definition of supertypes for HLA molecules using clustering of specificity matrices , 2004, Immunogenetics.

[9]  Andrew K. Sewell,et al.  Why must T cells be cross-reactive? , 2012, Nature Reviews Immunology.

[10]  Mathieu Rouard,et al.  IMGT unique numbering for immunoglobulin and T cell receptor constant domains and Ig superfamily C-like domains. , 2005, Developmental and comparative immunology.

[11]  Alessandro Sette,et al.  Identifying specificity groups in the T cell receptor repertoire , 2017, Nature.

[12]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[13]  V. Wong,et al.  Exploiting T cell receptor genes for cancer immunotherapy , 2005, Clinical and experimental immunology.

[14]  D. Price,et al.  A Single Autoimmune T Cell Receptor Recognizes More Than a Million Different Peptides* , 2011, The Journal of Biological Chemistry.

[15]  Ursula Esser,et al.  Mapping T-cell receptor–peptide contacts by variant peptide immunization of single-chain transgenics , 1992, Nature.

[16]  Mark M. Davis,et al.  T-cell antigen receptor genes and T-cell recognition , 1988, Nature.

[17]  Fei Guo,et al.  Multi-scale encoding of amino acid sequences for predicting protein interactions using gradient boosting decision tree , 2017, PloS one.

[18]  Jaime Prilusky,et al.  McPAS‐TCR: a manually curated catalogue of pathology‐associated T cell receptor sequences , 2017, Bioinform..

[19]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[20]  F. Alt,et al.  The Mechanism and Regulation of Chromosomal V(D)J Recombination , 2002, Cell.

[21]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Robyn L Stanfield,et al.  How TCRs bind MHCs, peptides, and coreceptors. , 2006, Annual review of immunology.

[23]  Thierry Mora,et al.  Quantifying lymphocyte receptor diversity , 2016 .

[24]  P. Bradley,et al.  Quantifiable predictive features define epitope-specific T cell receptor repertoires , 2017, Nature.

[25]  Andrew K. Sewell,et al.  VDJdb: a curated database of T-cell receptor sequences with known antigen specificity , 2017, Nucleic Acids Res..

[26]  Alessandro Sette,et al.  Properties of MHC Class I Presented Peptides That Enhance Immunogenicity , 2013, PLoS Comput. Biol..