Machine learning multi-classifiers for peptide classification

In this paper, we study the performance improvement that it is possible to obtain combining classifiers based on different notions (each trained using a different physicochemical property of amino-acids). This multi-classifier has been tested in three problems: HIV-protease; recognition of T-cell epitopes; predictive vaccinology. We propose a multi-classifier that combines a classifier that approaches the problem as a two-class pattern recognition problem and a method based on a one-class classifier. Several classifiers combined with the “sum rule” enables us to obtain an improvement performance over the best results previously published in the literature.

[1]  David G. Stork,et al.  Pattern Classification , 1973 .

[2]  Yingdong Zhao,et al.  Application of support vector machines for T-cell epitopes prediction , 2003, Bioinform..

[3]  Vladimir Brusic,et al.  Neural Models for Predicting Viral Vaccine Targets , 2005, J. Bioinform. Comput. Biol..

[4]  Erkki Oja,et al.  Subspace methods of pattern recognition , 1983 .

[5]  Ludmila I. Kuncheva,et al.  Examining the Relationship Between Majority Vote Accuracy and Diversity in Bagging and Boosting , 2003 .

[6]  Lukasz A. Kurgan,et al.  Highly accurate and consistent method for prediction of helix and strand content from primary protein sequences , 2005, Artif. Intell. Medicine.

[7]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[8]  Lei Huang,et al.  A SUPPORT VECTOR MACHINE APPROACH FOR PREDICTION OF T CELL EPITOPES , 2005 .

[9]  Padraig Cunningham,et al.  Using Diversity in Preparing Ensembles of Classifiers Based on Different Feature Subsets to Minimize Generalization Error , 2001, ECML.

[10]  Mübeccel Demirekler,et al.  An information theoretic framework for weight estimation in the combination of probabilistic classifiers for speaker identification , 2000, Speech Commun..

[11]  Arun Ross,et al.  Score normalization in multimodal biometric systems , 2005, Pattern Recognit..

[12]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .

[13]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[14]  Loris Nanni,et al.  RegionBoost learning for 2D+3D based face recognition , 2007, Pattern Recognit. Lett..

[15]  Loris Nanni,et al.  An ensemble of K-local hyperplanes for predicting protein-protein interactions , 2006, Bioinform..

[16]  Loris Nanni,et al.  MppS: An ensemble of support vector machine based on multiple physicochemical properties of amino acids , 2006, Neurocomputing.

[17]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  M. Kanehisa,et al.  Cluster analysis of amino acid indices for prediction of protein structure and function. , 1988, Protein engineering.

[19]  Loris Nanni,et al.  A reliable method for HIV-1 protease cleavage site prediction , 2006, Neurocomputing.

[20]  Loris Nanni,et al.  Comparison among feature extraction methods for HIV-1 protease cleavage site prediction , 2006, Pattern Recognit..

[21]  Loris Nanni,et al.  Experimental comparison of one-class classifiers for online signature verification , 2006, Neurocomputing.

[22]  Vladimir Brusic,et al.  Prediction of promiscuous peptides that bind HLA class I molecules , 2002, Immunology and cell biology.

[23]  M. Kanehisa,et al.  Analysis of amino acid indices and mutation matrices for sequence comparison and structure prediction of proteins. , 1996, Protein engineering.

[24]  Loris Nanni,et al.  Machine learning algorithms for T-cell epitopes prediction , 2006, Neurocomputing.

[25]  Raymond J. Mooney,et al.  Constructing Diverse Classifier Ensembles using Artificial Training Examples , 2003, IJCAI.

[26]  Loris Nanni,et al.  Ensemble of Parzen window classifiers for on-line signature verification , 2005, Neurocomputing.

[27]  Vladimir Brusic,et al.  Predictive Vaccinology: Optimisation of Predictions Using Support Vector Machine Classifiers , 2005, IDEAL.

[28]  Hakan Altinçay,et al.  Ensembling evidential k-nearest neighbor classifiers through multi-modal perturbation , 2007, Appl. Soft Comput..

[29]  Thorsteinn S. Rögnvaldsson,et al.  Why neural networks should not be used for HIV-1 protease cleavage site prediction , 2004, Bioinform..

[30]  D. Opitz,et al.  Popular Ensemble Methods: An Empirical Study , 1999, J. Artif. Intell. Res..

[31]  Loris Nanni,et al.  Machine learning for HIV-1 protease cleavage site prediction , 2006, Pattern Recognit. Lett..

[32]  Jian Guo,et al.  A novel method for protein subcellular localization: Combining residue-couple model and SVM , 2005, APBC.

[33]  Loris Nanni,et al.  An enhanced subspace method for face recognition , 2006, Pattern Recognit. Lett..

[34]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[35]  Minoru Kanehisa,et al.  AAindex: Amino Acid index database , 2000, Nucleic Acids Res..

[36]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[37]  Yang Yu,et al.  Ensembling local learners ThroughMultimodal perturbation , 2005, IEEE Trans. Syst. Man Cybern. Part B.

[38]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .