Predicting Class II MHC-Peptide binding: a kernel based approach using similarity scores

BackgroundModelling the interaction between potentially antigenic peptides and Major Histocompatibility Complex (MHC) molecules is a key step in identifying potential T-cell epitopes. For Class II MHC alleles, the binding groove is open at both ends, causing ambiguity in the positional alignment between the groove and peptide, as well as creating uncertainty as to what parts of the peptide interact with the MHC. Moreover, the antigenic peptides have variable lengths, making naive modelling methods difficult to apply. This paper introduces a kernel method that can handle variable length peptides effectively by quantifying similarities between peptide sequences and integrating these into the kernel.ResultsThe kernel approach presented here shows increased prediction accuracy with a significantly higher number of true positives and negatives on multiple MHC class II alleles, when testing data sets from MHCPEP [1], MCHBN [2], and MHCBench [3]. Evaluation by cross validation, when segregating binders and non-binders, produced an average of 0.824 AROC for the MHCBench data sets (up from 0.756), and an average of 0.96 AROC for multiple alleles of the MHCPEP database.ConclusionThe method improves performance over existing state-of-the-art methods of MHC class II peptide binding predictions by using a custom, knowledge-based representation of peptides. Similarity scores, in contrast to a fixed-length, pocket-specific representation of amino acids, provide a flexible and powerful way of modelling MHC binding, and can easily be applied to other dynamic sequence problems.

[1]  Hiroyuki Ogata,et al.  AAindex: Amino Acid Index Database , 1999, Nucleic Acids Res..

[2]  Philip J Norris,et al.  A hairpin turn in a class II MHC-bound peptide orients residues outside the binding groove for T cell recognition. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[3]  F E Cohen,et al.  Pairwise sequence alignment below the twilight zone. , 2001, Journal of molecular biology.

[4]  E. Bergseng,et al.  Inhibition of HLA-DQ2-mediated antigen presentation by analogues of a high affinity 33-residue peptide from alpha2-gliadin. , 2006, Journal of the American Chemical Society.

[5]  B. Efron The jackknife, the bootstrap, and other resampling plans , 1987 .

[6]  Bernhard Schölkopf,et al.  Kernel Methods in Computational Biology , 2005 .

[7]  H. Kalbacher,et al.  Self-peptides from four HLA-DR alleles share hydrophobic anchor residues near the NH2-terminal including proline as a stop signal for trimming. , 1993, Journal of immunology.

[8]  M. Wauben,et al.  Definition of an extended MHC class II-peptide binding motif for the autoimmune disease-associated Lewis rat RT1.BL molecule. , 1997, International immunology.

[9]  Pingping Guan,et al.  Analysis of peptide-protein binding using amino acid descriptors: prediction and experimental verification for human histocompatibility complex HLA-A0201. , 2005, Journal of medicinal chemistry.

[10]  Thomas G. Dietterich Adaptive computation and machine learning , 1998 .

[11]  Sandra Fillebrown,et al.  The MathWorks' MATLAB , 1996 .

[12]  R. Spang,et al.  Estimating amino acid substitution models: a comparison of Dayhoff's estimator, the resolvent approach and a maximum likelihood method. , 2002, Molecular biology and evolution.

[13]  Yuanyuan Xiao,et al.  Prediction of Genomewide Conserved Epitope Profiles of HIV-1: Classifier Choice and Peptide Representation , 2005, Statistical applications in genetics and molecular biology.

[14]  Channa K. Hattotuwagama,et al.  Toward Prediction of Class II Mouse Major Histocompatibility Complex Peptide Binding Affinity: in Silico Bioinformatic Evaluation Using Partial Least Squares, a Robust Multivariate Statistical Technique , 2006, J. Chem. Inf. Model..

[15]  Martin Vingron,et al.  Modeling Amino Acid Replacement , 2000, J. Comput. Biol..

[16]  Jean-Philippe Vert,et al.  Local Alignment Kernels for Biological Sequences , 2004 .

[17]  B. Efron,et al.  The Jackknife: The Bootstrap and Other Resampling Plans. , 1983 .

[18]  Søren Brunak,et al.  Improved prediction of MHC class I and class II epitopes using a novel Gibbs sampling approach , 2004, Bioinform..

[19]  David A Winkler,et al.  Predictive Bayesian neural network models of MHC class II peptide binding. , 2005, Journal of molecular graphics & modelling.

[20]  H. Rammensee,et al.  SYFPEITHI: database for MHC ligands and peptide motifs , 1999, Immunogenetics.

[21]  Matthew W. Anderson,et al.  A polymorphic pocket at the P10 position contributes to peptide binding specificity in class II MHC proteins. , 2004, Chemistry & biology.

[22]  Ke Wang,et al.  Profile-based string kernels for remote homology detection and motif extraction , 2004, Proceedings. 2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB 2004..

[23]  J. Yewdell,et al.  Immunodominance in major histocompatibility complex class I-restricted T lymphocyte responses. , 1999, Annual review of immunology.

[24]  C. Granier,et al.  Modulation of TCR recognition of MHC class II/peptide by processed remote N- and C-terminal epitope extensions. , 2000, Human immunology.

[25]  Zheng Rong Yang,et al.  Prediction of T-Cell Epitopes Using Biosupport Vector Machines , 2005, J. Chem. Inf. Model..

[26]  S F Altschul,et al.  Iterated profile searches with PSI-BLAST--a tool for discovery in protein databases. , 1998, Trends in biochemical sciences.

[27]  Yang Dai,et al.  Prediction of MHC class II binding peptides based on an iterative learning model , 2005, Immunome research.

[28]  L C Harrison,et al.  Fuzzy neural network-based prediction of the motif for MHC class II binding peptides. , 2001, Journal of bioscience and bioengineering.

[29]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[30]  Vladimir Brusic,et al.  MHCPEP, a database of MHC-binding peptides: update 1996 , 1997, Nucleic Acids Res..

[31]  Maria V. Tejada-Simon,et al.  Naturally Processed HLA Class II Peptides Reveal Highly Conserved Immunogenic Flanking Region Sequence Preferences That Reflect Antigen Processing Rather Than Peptide-MHC Interactions1 , 2001, The Journal of Immunology.

[32]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[33]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[34]  Irini A. Doytchinova,et al.  Towards the in silico identification of class II restricted T-cell epitopes: a partial least squares iterative self-consistent algorithm for affinity prediction , 2003, Bioinform..

[35]  Tin Wee Tan,et al.  Structural bioinformatics Prediction of HLA-DQ 3 . 2 b Ligands : evidence of multiple registers in class II binding peptides , 2006 .

[36]  F. Sinigaglia,et al.  HLA class II peptide binding specificity and autoimmunity. , 1997, Advances in immunology.

[37]  J. Trowsdale,et al.  Genetics and molecular genetics of the MHC. , 1999, Reviews in immunogenetics.

[38]  Arne Elofsson,et al.  Prediction of MHC class I binding peptides, using SVMHC , 2002, BMC Bioinformatics.

[39]  Andrew E. Torda,et al.  Amino acid similarity matrices based on force fields , 2001, Bioinform..

[40]  David L. Woodland,et al.  The Majority of Immunogenic Epitopes Generate CD4+ T Cells That Are Dependent on MHC Class II-Bound Peptide-Flanking Residues1 , 2002, The Journal of Immunology.

[41]  Gajendra P. S. Raghava,et al.  MHCBN: a comprehensive database of MHC binding and non-binding peptides , 2003, Bioinform..

[42]  Vladimir Brusic,et al.  Prediction of MHC class II-binding peptides using an evolutionary algorithm and artificial neural network , 1998, Bioinform..

[43]  Jianming Shi,et al.  Prediction of MHC class II binders using the ant colony search strategy , 2005, Artif. Intell. Medicine.

[44]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[45]  T. Hanai,et al.  Hidden Markov model-based prediction of antigenic peptides that interact with MHC class II molecules. , 2002, Journal of bioscience and bioengineering.

[46]  David Haussler,et al.  Convolution kernels on discrete structures , 1999 .

[47]  Wen Liu,et al.  Quantitative prediction of mouse class I MHC peptide binding affinity using support vector machine regression (SVR) models , 2006, BMC Bioinformatics.

[48]  Z. Nagy,et al.  Precise prediction of major histocompatibility complex class II-peptide interaction based on peptide side chain scanning , 1994, The Journal of experimental medicine.

[49]  Yoshua Bengio,et al.  No Unbiased Estimator of the Variance of K-Fold Cross-Validation , 2003, J. Mach. Learn. Res..

[50]  D. Vignali,et al.  T cell receptor recognition of MHC class II-bound peptide flanking residues enhances immunogenicity and results in altered TCR V region usage. , 1997, Immunity.

[51]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[52]  R. R. Mallios,et al.  Predicting class II MHC/peptide multi-level binding with an iterative stepwise discriminant analysis meta-algorithm , 2001, Bioinform..

[53]  Gajendra P. S. Raghava,et al.  SVM based method for predicting HLA-DRB1*0401 binding peptides in an antigen sequence , 2004, Bioinform..

[54]  Tatsuya Akutsu,et al.  Protein homology detection using string alignment kernels , 2004, Bioinform..