A hybrid discriminative/generative approach to protein fold recognition

There are two standard approaches to the classification task: generative, which use training data to estimate a probability model for each class, and discriminative, which try to construct flexible decision boundaries between the classes. An ideal classifier should combine these two approaches. In this paper a classifier combining the well-known support vector machine (SVM) classifier with regularized discriminant analysis (RDA) classifier is presented. The hybrid classifier is used for protein structure prediction which is one of the most important goals pursued by bioinformatics. The obtained results are promising, the hybrid classifier achieves better result than the SVM or RDA classifiers alone. The proposed method achieves higher recognition ratio than other methods described in the literature.

[1]  Abdollah Dehzangi,et al.  Using Random Forest for Protein Fold Prediction Problem: An Empirical Study , 2010, J. Inf. Sci. Eng..

[2]  A. Wayne Whitney,et al.  A Direct Method of Nonparametric Measurement Selection , 1971, IEEE Transactions on Computers.

[3]  U. Hobohm,et al.  Enlarged representative set of protein structures , 1994, Protein science : a publication of the Protein Society.

[4]  Inna Dubchak,et al.  Protein Folding Class Predictor for SCOP: Approach Based on Global Descriptors , 1997, ISMB.

[5]  Dimitrios I. Fotiadis,et al.  Improving the protein fold recognition accuracy of a reduced state-space hidden Markov model , 2009, Comput. Biol. Medicine.

[6]  Chris H. Q. Ding,et al.  Multi-class protein fold recognition using support vector machines and neural networks , 2001, Bioinform..

[7]  Katarzyna Stapor,et al.  Protein Fold Recognition with Combined SVM-RDA Classifier , 2010, HAIS.

[9]  Infotech Oulu,et al.  Protein Fold Recognition with K-Local Hyperplane Distance Nearest Neighbor Algorithm , 2004 .

[10]  Robert Tibshirani,et al.  Classification by Pairwise Coupling , 1997, NIPS.

[11]  Zoran Obradovic,et al.  Feature Selection Filters Based on the Permutation Test , 2004, ECML.

[12]  K. Dill,et al.  The Protein Folding Problem , 1993 .

[13]  Jennifer G. Dy,et al.  A hierarchical method for multi-class support vector machines , 2004, ICML.

[14]  Nello Cristianini,et al.  Large Margin DAGs for Multiclass Classification , 1999, NIPS.

[15]  B. Fei,et al.  Binary tree of SVM: a new fast multiclass training and classification algorithm , 2006, IEEE Transactions on Neural Networks.

[16]  Ulrich H.-G. Kreßel,et al.  Pairwise classification and support vector machines , 1999 .

[17]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[18]  Mineichi Kudo,et al.  Comparison of algorithms that select features for pattern classifiers , 2000, Pattern Recognit..

[19]  M. Madera,et al.  A comparison of profile hidden Markov model procedures for remote homology detection. , 2002, Nucleic acids research.

[20]  Yorgos Goletsis,et al.  Sequence-based protein structure prediction using a reduced state-space hidden Markov model , 2007, Comput. Biol. Medicine.

[21]  Ping Guo,et al.  Regularization Versus Dimension Reduction, Which Is Better? , 2007, ISNN.

[22]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[23]  U. Hobohm,et al.  Selection of representative protein data sets , 1992, Protein science : a publication of the Protein Society.

[24]  K. Chou,et al.  Hum-mPLoc: an ensemble classifier for large-scale human protein subcellular location prediction by incorporating samples with multiple sites. , 2007, Biochemical and biophysical research communications.

[25]  Loris Nanni,et al.  Ensemble of classifiers for protein fold recognition , 2006, Neurocomputing.

[26]  Tim J. P. Hubbard,et al.  SCOP: a structural classification of proteins database , 1998, Nucleic Acids Res..

[27]  Chun-Xia Zhang,et al.  RotBoost: A technique for combining Rotation Forest and AdaBoost , 2008, Pattern Recognit. Lett..

[28]  Loris Nanni,et al.  High performance set of PseAAC and sequence based descriptors for protein classification. , 2010, Journal of theoretical biology.

[29]  Dimitrios I. Fotiadis,et al.  Mining sequential patterns for protein fold recognition , 2008, J. Biomed. Informatics.

[30]  Loris Nanni,et al.  MppS: An ensemble of support vector machine based on multiple physicochemical properties of amino acids , 2006, Neurocomputing.

[31]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[32]  Franco Scarselli,et al.  Are Multilayer Perceptrons Adequate for Pattern Recognition and Verification? , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[33]  Kuo-Chen Chou,et al.  Ensemble classifier for protein fold pattern recognition , 2006, Bioinform..

[34]  Pierre Baldi,et al.  Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..

[35]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[36]  I. Muchnik,et al.  Prediction of protein folding class using global description of amino acid sequence. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[37]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[38]  Boonserm Kijsirikul,et al.  Multiclass support vector machines using adaptive directed acyclic graph , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[39]  S. Eddy Hidden Markov models. , 1996, Current opinion in structural biology.

[40]  H. J. Arnold Introduction to the Practice of Statistics , 1990 .

[41]  Chin-Teng Lin,et al.  Recognition of Structure Classification of Protein Folding by NN and SVM Hierarchical Learning Architecture , 2003, ICANN.

[42]  J. Friedman Regularized Discriminant Analysis , 1989 .

[43]  Loris Nanni A novel ensemble of classifiers for protein fold recognition , 2006, Neurocomputing.

[44]  Cheng-Lin Liu,et al.  Classification and Learning for Character Recognition: Comparison of Methods and Remaining Problems , 2005 .

[45]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.