Modeling major histocompatibility complex binding by nonparametric averaging of multiple predictors and sequence encodings.

There has been considerable interest in statistical approaches that leverage the large volumes of experimental data to predict the binding of Major Histocompatibility Complex class I (MHC-I) molecules to peptides. Here we present our method for averaging together multiple predictors for MHC-peptide binding, where given a particular MHC molecule, a set of predictors and a set of training peptides, our method will average multiple simple predictors for MHC binding to produce a final prediction of the binding affinity between a given MHC molecule and a test peptide. The averaging of predictors is done using a nonparametric method, whereby for any test peptide, we identify similar peptides in the training set and average the predictions on the training set, weighted by each predictor's average accuracy for similar peptides in the training set. We show that our method significantly improves on individual predictors based on held-out data and also produces a predictor whose accuracy is competitive with state-of-the-art techniques based on the results from the Machine Learning in Immunology competition in which 21 submitted techniques were assessed on their accuracy in predicting the binding of HLA-A*0101, HLA-A*0201 and HLA-B*0702 molecules to 9-mer and 10-mer peptides.

[1]  B. Rost,et al.  Critical assessment of methods of protein structure prediction (CASP)—Round 6 , 2005 .

[2]  Morten Nielsen,et al.  A Community Resource Benchmarking Predictions of Peptide Binding to MHC-I Molecules , 2006, PLoS Comput. Biol..

[3]  J. Sidney,et al.  Peptide binding to the most frequent HLA-A class I alleles measured by quantitative molecular binding assays. , 1994, Molecular immunology.

[4]  A. Tramontano,et al.  Critical assessment of methods of protein structure prediction (CASP)—round IX , 2011, Proteins.

[5]  O. Lund,et al.  novel sequence representations Reliable prediction of T-cell epitopes using neural networks with , 2003 .

[6]  E. Nadaraya On Estimating Regression , 1964 .

[7]  Tomer Hertz,et al.  PepDist: A New Framework for Protein-Peptide Binding Prediction based on Learning Peptide Distance Functions , 2006, BMC Bioinformatics.

[8]  H. Rammensee,et al.  SYFPEITHI: database for MHC ligands and peptide motifs , 1999, Immunogenetics.

[9]  V. Brusic,et al.  Evaluation of MHC class I peptide binding prediction servers: Applications for vaccine research , 2008, BMC Immunology.

[10]  Daniel Fischer,et al.  3D‐SHOTGUN: A novel, cooperative, fold‐recognition meta‐predictor , 2003, Proteins.

[11]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[12]  Ora Schueler-Furman,et al.  Learning MHC I - peptide binding , 2006, ISMB.

[13]  Hiroshi Mamitsuka,et al.  MetaMHC: a meta approach to predict peptides binding to MHC molecules , 2010, Nucleic Acids Res..

[14]  O Ouchterlony,et al.  A solid-phase enzyme-linked immunospot (ELISPOT) assay for enumeration of specific antibody-secreting cells. , 1983, Journal of immunological methods.

[15]  K. Parker,et al.  Scheme for ranking potential HLA-A2 binding peptides based on independent binding of individual peptide side-chains. , 1994, Journal of immunology.

[16]  Yang Zhang Progress and challenges in protein structure prediction. , 2008, Current opinion in structural biology.

[17]  Jean-Philippe Vert,et al.  Efficient peptide-MHC-I binding prediction for alleles with few known binders , 2008, Bioinform..

[18]  G. S. Watson,et al.  Smooth regression analysis , 1964 .

[19]  O. Lund,et al.  NetMHCpan, a Method for Quantitative Predictions of Peptide Binding to Any HLA-A and -B Locus Protein of Known Sequence , 2007, PloS one.