Predicting Peptide-MHC Binding Affinities with Imputed Training Data

Predicting the binding affinity between MHC proteins and their peptide ligands is a key problem in computational immunology. State of the art performance is currently achieved by the allele-specific predictor NetMHC and the pan-allele predictor NetMHCpan, both of which are ensembles of shallow neural networks. We explore an intermediate between allele-specific and pan-allele prediction: training allele-specific predictors with synthetic samples generated by imputation of the peptide-MHC affinity matrix. We find that the imputation strategy is useful on alleles with very little training data. We have implemented our predictor as an open-source software package called MHCflurry and show that MHCflurry achieves competitive performance to NetMHC and NetMHCpan.

[1]  Clemencia Pinilla,et al.  How the T Cell Repertoire Becomes Peptide and MHC Specific , 2005, Cell.

[2]  Bjoern Peters,et al.  The immune epitope database: a historical retrospective of the first decade , 2012, Immunology.

[3]  John Sidney,et al.  Examining the independent binding assumption for binding of peptide epitopes to MHC-I molecules , 2003, Bioinform..

[4]  O. Lund,et al.  NetMHCpan, a Method for Quantitative Predictions of Peptide Binding to Any HLA-A and -B Locus Protein of Known Sequence , 2007, PloS one.

[5]  Morten Nielsen,et al.  NetMHC-3.0: accurate web accessible predictions of human, mouse and monkey MHC class I affinities for peptides of length 8–11 , 2008, Nucleic Acids Res..

[6]  Robert Tibshirani,et al.  Spectral Regularization Algorithms for Learning Large Incomplete Matrices , 2010, J. Mach. Learn. Res..

[7]  Sergey Feldman,et al.  fancyimpute: Version 0.0.16 , 2016 .

[8]  Peter Cresswell,et al.  Mechanisms of MHC class I‐restricted antigen processing and cross‐presentation , 2005, Immunological reviews.

[9]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[10]  Constantine Frangakis,et al.  Multiple imputation by chained equations: what is it and how does it work? , 2011, International journal of methods in psychiatric research.

[11]  P. Marrack,et al.  The role of the T cell receptor in positive and negative selection of developing T cells. , 1990, Science.

[12]  H. Grey,et al.  Prediction of major histocompatibility complex binding regions of protein antigens by sequence pattern analysis. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Morten Nielsen,et al.  Dataset size and composition impact the reliability of performance benchmarks for peptide-MHC binding predictions , 2014, BMC Bioinformatics.

[14]  David Gfeller,et al.  Current tools for predicting cancer-specific T cell immunity , 2016, Oncoimmunology.

[15]  Morten Nielsen,et al.  Modeling the adaptive immune system: predictions and simulations , 2007, Bioinform..

[16]  K. Franken,et al.  CD8 T cell autoreactivity to preproinsulin epitopes with very low human leucocyte antigen class I binding affinity , 2012, Clinical and experimental immunology.

[17]  Maxim N. Artyomov,et al.  Tumor neoantigens: building a framework for personalized cancer immunotherapy. , 2015, The Journal of clinical investigation.

[18]  Morten Nielsen,et al.  Human Leukocyte Antigen (HLA) Class I Restricted Epitope Discovery in Yellow Fewer and Dengue Viruses: Importance of HLA Binding Strength , 2011, PloS one.

[19]  Morten Nielsen,et al.  Accurate approximation method for prediction of class I MHC affinities for peptides of length 8, 10 and 11 using prediction tools trained on 9mers , 2008, Bioinform..