Prediction of MHC class I binding peptides, using SVMHC

BackgroundT-cells are key players in regulating a specific immune response. Activation of cytotoxic T-cells requires recognition of specific peptides bound to Major Histocompatibility Complex (MHC) class I molecules. MHC-peptide complexes are potential tools for diagnosis and treatment of pathogens and cancer, as well as for the development of peptide vaccines. Only one in 100 to 200 potential binders actually binds to a certain MHC molecule, therefore a good prediction method for MHC class I binding peptides can reduce the number of candidate binders that need to be synthesized and tested.ResultsHere, we present a novel approach, SVMHC, based on support vector machines to predict the binding of peptides to MHC class I molecules. This method seems to perform slightly better than two profile based methods, SYFPEITHI and HLA_BIND. The implementation of SVMHC is quite simple and does not involve any manual steps, therefore as more data become available it is trivial to provide prediction for more MHC types. SVMHC currently contains prediction for 26 MHC class I types from the MHCPEP database or alternatively 6 MHC class I types from the higher quality SYFPEITHI database. The prediction models for these MHC types are implemented in a public web service available at http://www.sbc.su.se/svmhc/.ConclusionsPrediction of MHC class I binding peptides using Support Vector Machines, shows high performance and is easy to apply to a large number of MHC class I types. As more peptide data are put into MHC databases, SVMHC can easily be updated to give prediction for additional MHC class I types. We suggest that the number of binding peptides needed for SVM training is at least 20 sequences.

[1]  R. J. Stonier,et al.  Complex Systems: Mechanism of Adaptation , 1994 .

[2]  V. Brusic,et al.  Neural network-based prediction of candidate T-cell epitopes , 1998, Nature Biotechnology.

[3]  A Sette,et al.  Two complementary methods for predicting peptides binding major histocompatibility complex molecules. , 1997, Journal of molecular biology.

[4]  H Mamitsuka,et al.  Predicting peptides that bind to MHC molecules using supervised learning of hidden markov models , 1998, Proteins.

[5]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[6]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[7]  Alessandro Sette,et al.  HLA expression in cancer: implications for T cell-based immunotherapy , 2001, Immunogenetics.

[8]  Hans-Georg Rammensee,et al.  MHC ligands and peptide motifs: first listing , 2004, Immunogenetics.

[9]  A. D. McLachlan,et al.  Profile analysis: detection of distantly related proteins. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[10]  O. Schueler‐Furman,et al.  Structure‐based prediction of binding peptides to MHC class I molecules: Application to a broad range of MHC alleles , 2000, Protein science : a publication of the Protein Society.

[11]  Chris H. Q. Ding,et al.  Multi-class protein fold recognition using support vector machines and neural networks , 2001, Bioinform..

[12]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[13]  H. Rammensee,et al.  Peptide motifs of closely related HLA class I molecules encompass substantial differences , 1992, European journal of immunology.

[15]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[16]  H. Rammensee,et al.  SYFPEITHI: database for MHC ligands and peptide motifs , 1999, Immunogenetics.

[17]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[18]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[19]  J. Yewdell,et al.  Immunodominance in major histocompatibility complex class I-restricted T lymphocyte responses. , 1999, Annual review of immunology.

[20]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[21]  E. Lindahl,et al.  Identification of related proteins on family, superfamily and fold level. , 2000, Journal of molecular biology.

[22]  Philip Lijnzaad,et al.  The Ensembl genome database project , 2002, Nucleic Acids Res..

[23]  K. Parker,et al.  Scheme for ranking potential HLA-A2 binding peptides based on independent binding of individual peptide side-chains. , 1994, Journal of immunology.

[24]  Vladimir Brusic,et al.  MHCPEP, a database of MHC-binding peptides: update 1996 , 1997, Nucleic Acids Res..

[25]  Gregory R. Grant,et al.  Bioinformatics - The Machine Learning Approach , 2000, Comput. Chem..