Predicting peptide presentation by major histocompatibility complex class I: an improved machine learning approach to the immunopeptidome

BackgroundTo further our understanding of immunopeptidomics, improved tools are needed to identify peptides presented by major histocompatibility complex class I (MHC-I). Many existing tools are limited by their reliance upon chemical affinity data, which is less biologically relevant than sampling by mass spectrometry, and other tools are limited by incomplete exploration of machine learning approaches. Herein, we assemble publicly available data describing human peptides discovered by sampling the MHC-I immunopeptidome with mass spectrometry and use this database to train random forest classifiers (ForestMHC) to predict presentation by MHC-I.ResultsAs measured by precision in the top 1% of predictions, our method outperforms NetMHC and NetMHCpan on test sets, and it outperforms both these methods and MixMHCpred on new data from an ovarian carcinoma cell line. We also find that random forest scores correlate monotonically, but not linearly, with known chemical binding affinities, and an information-based analysis of classifier features shows the importance of anchor positions for our classification. The random-forest approach also outperforms a deep neural network and a convolutional neural network trained on identical data. Finally, we use our large database to confirm that gene expression partially determines peptide presentation.ConclusionsForestMHC is a promising method to identify peptides bound by MHC-I. We have demonstrated the utility of random forest-based approaches in predicting peptide presentation by MHC-I, assembled the largest known database of MS binding data, and mined this database to show the effect of gene expression on peptide presentation. ForestMHC has potential applicability to basic immunology, rational vaccine design, and neoantigen binding prediction for cancer immunotherapy. This method is publicly available for applications and further validation.

[1]  David Gfeller,et al.  Deciphering HLA-I motifs across HLA peptidomes improves neo-antigen predictions and identifies allostery regulating HLA specificity , 2017, bioRxiv.

[2]  Albert J R Heck,et al.  Expanding the detectable HLA peptide repertoire using electron-transfer/higher-energy collision dissociation (EThcD) , 2014, Proceedings of the National Academy of Sciences.

[3]  Dario Neri,et al.  Mass spectrometric analysis of the HLA class I peptidome of melanoma cell lines as a promising tool for the identification of putative tumor-associated HLA epitopes , 2016, Cancer Immunology, Immunotherapy.

[4]  Steven P Gygi,et al.  Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry , 2007, Nature Methods.

[5]  F. Pazos,et al.  A Molecular Basis for the Presentation of Phosphorylated Peptides by HLA-B Antigens* , 2016, Molecular & Cellular Proteomics.

[6]  Alessandro Sette,et al.  An open-source computational and data resource to analyze digital maps of immunopeptidomes , 2015, eLife.

[7]  Dario Neri,et al.  High‐sensitivity HLA class I peptidome analysis enables a precise definition of peptide motifs and the identification of peptides from cell lines and patients’ sera , 2016, Proteomics.

[8]  Mark Gerstein,et al.  High-order neural networks and kernel methods for peptide-MHC binding prediction , 2015, Bioinform..

[9]  M. Mann,et al.  Direct identification of clinically relevant neoepitopes presented on native human melanoma tissue by mass spectrometry , 2016, Nature Communications.

[10]  M. Nielsen,et al.  Defining the HLA class I‐associated viral antigen repertoire from HIV‐1‐infected human cells , 2015, European journal of immunology.

[11]  H. Rammensee,et al.  Allele-specific motifs revealed by sequencing of self-peptides eluted from MHC molecules , 1991, Nature.

[12]  F. Marincola,et al.  HLA class I and II genotype of the NCI-60 cell lines , 2005, Journal of Translational Medicine.

[13]  Takaya Saito,et al.  The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets , 2015, PloS one.

[14]  Markus Müller,et al.  High-throughput and Sensitive Immunopeptidomics Platform Reveals Profound Interferonγ-Mediated Remodeling of the Human Leukocyte Antigen (HLA) Ligandome* , 2017, Molecular & Cellular Proteomics.

[15]  S. Lemieux,et al.  Global proteogenomic analysis of human MHC class I-associated peptides derived from non-canonical reading frames , 2016, Nature Communications.

[16]  Edward L. Huttlin,et al.  A Tissue-Specific Atlas of Mouse Protein Phosphorylation and Expression , 2010, Cell.

[17]  Hans-Georg Rammensee,et al.  Unveiling the Peptide Motifs of HLA-C and HLA-G from Naturally Presented Peptides and Generation of Binding Prediction Matrices , 2017, The Journal of Immunology.

[18]  Sean R. Davis,et al.  NCBI GEO: archive for functional genomics data sets—update , 2012, Nucleic Acids Res..

[19]  Albert J R Heck,et al.  Arginine (Di)methylated Human Leukocyte Antigen Class I Peptides Are Favorably Presented by HLA-B*07. , 2017, Journal of proteome research.

[20]  Andrew R. Jones,et al.  ProteomeXchange provides globally co-ordinated proteomics data submission and dissemination , 2014, Nature Biotechnology.

[21]  Jianjun Hu,et al.  DeepMHC: Deep Convolutional Neural Networks for High-performance peptide-MHC Binding Affinity Prediction , 2017, bioRxiv.

[22]  Juan Pablo Albar,et al.  Increased Diversity of the HLA-B40 Ligandome by the Presentation of Peptides Phosphorylated at Their Main Anchor Residue* , 2013, Molecular & Cellular Proteomics.

[23]  L. Jensen,et al.  Mass Spectrometry of Human Leukocyte Antigen Class I Peptidomes Reveals Strong Effects of Protein Abundance and Turnover on Antigen Presentation* , 2015, Molecular & Cellular Proteomics.

[24]  Vineet Bafna,et al.  MHC class I loaded ligands from breast cancer cell lines: A potential HLA-I-typed antigen collection. , 2018, Journal of proteomics.

[25]  O. Lund,et al.  novel sequence representations Reliable prediction of T-cell epitopes using neural networks with , 2003 .

[26]  Morten Nielsen,et al.  Gapped sequence alignment using artificial neural networks: application to the MHC class I system , 2016, Bioinform..

[27]  J. Leunissen,et al.  The Human Leukocyte Antigen–presented Ligandome of B Lymphocytes* , 2013, Molecular & Cellular Proteomics.

[28]  Deborah Hix,et al.  The immune epitope database (IEDB) 3.0 , 2014, Nucleic Acids Res..

[29]  Sébastien Lemieux,et al.  Impact of genomic polymorphisms on the repertoire of human MHC class I-associated peptides , 2014, Nature Communications.

[30]  S. Rosenberg,et al.  Cancer Immunotherapy Based on Mutation-Specific CD4+ T Cells in a Patient with Epithelial Cancer , 2014, Science.

[31]  Jeffrey Dean,et al.  Scalable and accurate deep learning with electronic health records , 2018, npj Digital Medicine.

[32]  Morten Nielsen,et al.  Pan-Specific Prediction of Peptide–MHC Class I Complex Stability, a Correlate of T Cell Immunogenicity , 2016, The Journal of Immunology.

[33]  David L. Tabb,et al.  Protein Identification by SEQUEST , 2001 .

[34]  S. Lemieux,et al.  MHC class I-associated peptides derive from selective regions of the human genome. , 2016, The Journal of clinical investigation.

[35]  Alan Bridge,et al.  The UniProtKB guide to the human proteome , 2016, Database J. Biol. Databases Curation.

[36]  Ash A. Alizadeh,et al.  Antigen Presentation Profiling Reveals Recognition of Lymphoma Immunoglobulin Neoantigens , 2017, Nature.

[37]  Nuno A. Fonseca,et al.  Expression Atlas: gene and protein expression across multiple studies and organisms , 2017, Nucleic Acids Res..

[38]  Jennifer G. Abelin,et al.  Mass Spectrometry Profiling of HLA‐Associated Peptidomes in Mono‐allelic Cells Enables More Accurate Epitope Prediction , 2017, Immunity.

[39]  A. Ramos-Fernández,et al.  Comparative Analysis of the Endogenous Peptidomes Displayed by HLA-B*27 and Mamu-B*08: Two MHC Class I Alleles Associated with Elite Control of HIV/SIV Infection. , 2016, Journal of proteome research.

[40]  Morten Nielsen,et al.  NetMHCpan 4.0: Improved peptide-MHC class I interaction predictions integrating eluted ligand and peptide binding affinity data , 2017, bioRxiv.

[41]  Roman A. Zubarev,et al.  The SysteMHC Atlas project , 2017, Nucleic Acids Res..

[42]  Eilon Barnea,et al.  Human Leukocyte Antigen (HLA) Peptides Derived from Tumor Antigens Induced by Inhibition of DNA Methylation for Development of Drug-facilitated Immunotherapy * , 2016, Molecular & Cellular Proteomics.