Benchmarking predictions of MHC class I restricted T cell epitopes

T cell epitope candidates are commonly identified using computational prediction tools in order to enable applications such as vaccine design, cancer neoantigen identification, development of diagnostics and removal of unwanted immune responses against protein therapeutics. Most T cell epitope prediction tools are based on machine learning algorithms trained on MHC binding or naturally processed MHC ligand elution data. The ability of currently available tools to predict T cell epitopes has not been comprehensively evaluated. In this study, we used a recently published dataset that systematically defined T cell epitopes recognized in vaccinia virus (VACV) infected mice, considering both peptides predicted to bind MHC or experimentally eluted from infected cells, making this the most comprehensive dataset of T cell epitopes mapped in a complex pathogen. We evaluated the performance of all currently publicly available computational T cell epitope prediction tools to identify these major epitopes from all peptides encoded in the VACV proteome. We found that all methods were able to improve epitope identification above random, with the best performance achieved by neural network-based predictions trained on both MHC binding and MHC ligand elution data (NetMHCPan-4.0 and MHCFlurry). Impressively, these methods were able to capture more than half of the major epitopes in the top 0.04% (N = 277) of peptides in the VACV proteome (N = 767,788). These performance metrics provide guidance for immunologists as to which prediction methods to use. In addition, this benchmark was implemented in an open and easy to reproduce format, providing developers with a framework for future comparisons against new tools. Author summary Computational prediction tools are used to screen peptides to identify potential T cell epitope candidates. These tools, developed using machine learning methods, save time and resources in many immunological studies including vaccine discovery and cancer neoantigen identification. In addition to the already existing methods several epitope prediction tools are being developed these days but they lack a comprehensive and uniform evaluation to see which method performs best. In this study we did a comprehensive evaluation of publicly accessible MHC I restricted T cell epitope prediction tools using a recently published dataset of Vaccinia virus epitopes. We found that methods based on artificial neural network architecture and trained on both MHC binding and ligand elution data showed very high performance (NetMHCPan-4.0 and MHCFlurry). This benchmark analysis will help immunologists to choose the right prediction method for their desired work and will also serve as a framework for tool developers to evaluate new prediction methods.

[1]  H. Grey,et al.  Prediction of major histocompatibility complex binding regions of protein antigens by sequence pattern analysis. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[2]  K. Parker,et al.  Scheme for ranking potential HLA-A2 binding peptides based on independent binding of individual peptide side-chains. , 1994, Journal of immunology.

[3]  H. Margalit,et al.  Ranking potential binding peptides to MHC molecules by a computational threading approach. , 1995, Journal of molecular biology.

[4]  K. Rock,et al.  Cloned dendritic cells can present exogenous antigens on both MHC class I and class II molecules. , 1997, Journal of immunology.

[5]  H. Rammensee,et al.  SYFPEITHI: database for MHC ligands and peptide motifs , 1999, Immunogenetics.

[6]  E. Reinherz,et al.  Prediction of MHC class I binding peptides using profile motifs. , 2002, Human immunology.

[7]  Gajendra P. S. Raghava,et al.  ProPred1: Prediction of Promiscuous MHC Class-I Binding Sites , 2003, Bioinform..

[8]  Alessandro Sette,et al.  Generating quantitative models describing the sequence specificity of biological processes with the stabilized matrix method , 2005, BMC Bioinformatics.

[9]  Bjoern Peters,et al.  Automated generation and evaluation of specific MHC binding predictive tools: ARB matrix applications , 2005, Immunogenetics.

[10]  D. Tscharke,et al.  Identification of poxvirus CD8+ T cell determinants to enable rational design and characterization of smallpox vaccines , 2005, The Journal of experimental medicine.

[11]  Magdalini Moutaftsi,et al.  A consensus epitope prediction approach identifies the breadth of murine TCD8+-cell responses to vaccinia virus , 2006, Nature Biotechnology.

[12]  Morten Nielsen,et al.  A Community Resource Benchmarking Predictions of Peptide Binding to MHC-I Molecules , 2006, PLoS Comput. Biol..

[13]  John D. Hunter,et al.  Matplotlib: A 2D Graphics Environment , 2007, Computing in Science & Engineering.

[14]  Clemencia Pinilla,et al.  Derivation of an amino acid similarity matrix for peptide:MHC binding and its application as a Bayesian prior , 2009, BMC Bioinformatics.

[15]  Tjerk P. Straatsma,et al.  NWChem: A comprehensive and scalable open-source solution for large scale molecular simulations , 2010, Comput. Phys. Commun..

[16]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[17]  Jinn-Moon Yang,et al.  PAComplex: a web server to infer peptide antigen families and binding models from TCR–pMHC complexes , 2011, Nucleic Acids Res..

[18]  A. R. Hersperger,et al.  Comparable Polyfunctionality of Ectromelia Virus- and Vaccinia Virus-Specific Murine T Cells despite Markedly Different In Vivo Replication and Pathogenicity , 2012, Journal of Virology.

[19]  Deborah Hix,et al.  The immune epitope database (IEDB) 3.0 , 2014, Nucleic Acids Res..

[20]  M. Nielsen,et al.  The Length Distribution of Class I–Restricted T Cell Epitopes Is Determined by Both Peptide Supply and MHC Allele–Specific Binding Preference , 2016, The Journal of Immunology.

[21]  Harri Lähdesmäki,et al.  LuxGLM: a probabilistic covariate model for quantification of DNA methylation modifications with complex experimental designs , 2016, Bioinform..

[22]  Morten Nielsen,et al.  Gapped sequence alignment using artificial neural networks: application to the MHC class I system , 2016, Bioinform..

[23]  M. Nielsen,et al.  NetMHCpan-3.0; improved prediction of binding to MHC class I molecules integrating information from multiple receptor and peptide length datasets , 2016, Genome Medicine.

[24]  M. Nielsen,et al.  NetMHCpan-4.0: Improved Peptide–MHC Class I Interaction Predictions Integrating Eluted Ligand and Peptide Binding Affinity Data , 2017, The Journal of Immunology.

[25]  Alex Rubinsteyn,et al.  MHCflurry: Open-Source Class I MHC Binding Affinity Prediction. , 2018, Cell systems.

[26]  Weilong Zhao,et al.  Systematically benchmarking peptide-MHC binding predictors: From synthetic to naturally processed epitopes , 2018, PLoS Comput. Biol..

[27]  Bjoern Peters,et al.  Most viral peptides displayed by class I MHC on infected cells are immunogenic , 2019, Proceedings of the National Academy of Sciences.