Using Rotation Forest for Protein Fold Prediction Problem: An Empirical Study

Recent advancement in the pattern recognition field has driven many classification algorithms being implemented to tackle protein fold prediction problem. In this paper, a newly introduced method called Rotation Forest for building ensemble of classifiers based on bootstrap sampling and feature extraction is implemented and applied to challenge this problem. The Rotation Forest is a straight forward extension of bagging algorithms which aims to promote diversity within the ensemble through feature extraction by using Principle Component Analysis (PCA). We compare the performance of the employed method with other Meta classifiers that are based on boosting and bagging algorithms, such as: AdaBoost.M1, LogitBoost, Bagging and Random Forest. Experimental results show that the Rotation Forest enhanced the protein folding prediction accuracy better than the other applied Meta classifiers, as well as the previous works found in the literature.

[1]  Chin-Teng Lin,et al.  Recognition of Structure Classification of Protein Folding by NN and SVM Hierarchical Learning Architecture , 2003, ICANN.

[2]  Kuo-Chen Chou,et al.  Ensemble classifier for protein fold pattern recognition , 2006, Bioinform..

[3]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[4]  William Stafford Noble,et al.  Support vector machine learning from heterogeneous data: an empirical analysis using protein sequence and structure , 2006, Bioinform..

[5]  Djamel Bouchaffra,et al.  Protein Fold Recognition using a Structural Hidden Markov Model , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[6]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[7]  Zoubin Ghahramani,et al.  A Bayesian network model for protein fold and remote homologue recognition , 2002, Bioinform..

[8]  Xiang-Sun Zhang,et al.  Bridging protein local structures and protein functions , 2008, Amino Acids.

[9]  Theodoros Damoulas,et al.  Probabilistic multi-class multi-kernel learning: on protein fold recognition and remote homology detection , 2008, Bioinform..

[10]  Rehab Duwairi,et al.  A framework for predicting proteins 3D structures , 2008, 2008 IEEE/ACS International Conference on Computer Systems and Applications.

[11]  Yuan Yuan,et al.  Using Bagging classifier to predict protein domain structural class. , 2006, Journal of biomolecular structure & dynamics.

[12]  Dimitrios I. Fotiadis,et al.  Improving the protein fold recognition accuracy of a reduced state-space hidden Markov model , 2009, Comput. Biol. Medicine.

[13]  Chris H. Q. Ding,et al.  Multi-class protein fold recognition using support vector machines and neural networks , 2001, Bioinform..

[14]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[15]  A Chinnasamy,et al.  Protein structure and fold prediction using tree-augmented naive Bayesian classifier. , 2004, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[16]  Lukasz A. Kurgan,et al.  PFRES: protein fold classification by using evolutionary information and predicted secondary structure , 2007, Bioinform..

[17]  Tim J. P. Hubbard,et al.  SCOP database in 2002: refinements accommodate structural genomics , 2002, Nucleic Acids Res..

[18]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[19]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[20]  Ian Witten,et al.  Data Mining , 2000 .

[21]  Inna Dubchak,et al.  Protein Folding Class Predictor for SCOP: Approach Based on Global Descriptors , 1997, ISMB.

[22]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[23]  Nikhil R. Pal,et al.  Some New Features for Protein Fold Prediction , 2003, ICANN.

[24]  Kuo-Chen Chou,et al.  Using supervised fuzzy clustering to predict protein structural classes. , 2005, Biochemical and biophysical research communications.

[25]  Kalyanmoy Deb,et al.  Multiclass protein fold recognition using multiobjective evolutionary algorithms , 2004, 2004 Symposium on Computational Intelligence in Bioinformatics and Computational Biology.

[26]  Kuo-Chen Chou,et al.  Boosting classifier for predicting protein domain structural class. , 2005, Biochemical and biophysical research communications.

[27]  Kuo-Chen Chou,et al.  Prediction protein structural classes with pseudo-amino acid composition: approximate entropy and hydrophobicity pattern. , 2008, Journal of theoretical biology.

[28]  Chun-Xia Zhang,et al.  An empirical study of using Rotation Forest to improve regressors , 2008, Appl. Math. Comput..

[29]  K. Chou,et al.  Using LogitBoost classifier to predict protein structural classes. , 2006, Journal of theoretical biology.

[30]  X.-B. Zhou,et al.  Improved prediction of subcellular location for apoptosis proteins by the dual-layer support vector machine , 2008, Amino Acids.

[31]  Peixiang Cai,et al.  Predicting protein structural class with pseudo-amino acid composition and support vector machine fusion network. , 2006, Analytical biochemistry.

[32]  Hampapathalu A. Nagarajaram,et al.  Support Vector Machine-based classification of protein folds using the structural properties of amino acid residues and amino acid residue pairs , 2007, Bioinform..

[33]  Juan José Rodríguez Diez,et al.  An Experimental Study on Rotation Forest Ensembles , 2007, MCS.

[34]  S.-W. Zhang,et al.  Prediction of protein homo-oligomer types by pseudo amino acid composition: Approached with an improved feature extraction and Naive Bayes Feature Fusion , 2006, Amino Acids.

[35]  Kevin Karplus,et al.  SAM-T08, HMM-based protein structure prediction , 2009, Nucleic Acids Res..

[36]  Vojislav Kecman,et al.  Protein fold recognition with adaptive local hyperplane algorithm , 2009, 2009 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology.

[37]  Z.-C. Li,et al.  Prediction of protein structure class by coupling improved genetic algorithm and support vector machine , 2008, Amino Acids.

[38]  Infotech Oulu,et al.  Protein Fold Recognition with K-Local Hyperplane Distance Nearest Neighbor Algorithm , 2004 .

[39]  Yorgos Goletsis,et al.  Sequence-based protein structure prediction using a reduced state-space hidden Markov model , 2007, Comput. Biol. Medicine.

[40]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[41]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[42]  Peter Kokol,et al.  Effectiveness of Rotation Forest in Meta-learning Based Gene Expression Classification , 2007, Twentieth IEEE International Symposium on Computer-Based Medical Systems (CBMS'07).

[43]  U. Hobohm,et al.  Selection of representative protein data sets , 1992, Protein science : a publication of the Protein Society.

[44]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[45]  Loris Nanni,et al.  Ensemble of classifiers for protein fold recognition , 2006, Neurocomputing.

[46]  N.R. Pal,et al.  Prediction of Protein Folds: Extraction of New Features, Dimensionality Reduction, and Fusion of Heterogeneous Classifiers , 2009, IEEE Transactions on NanoBioscience.

[47]  Yasuo Matsuyama,et al.  Protein Folding Classification by Committee SVM Array , 2008, ICONIP.

[48]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[49]  Tim J. P. Hubbard,et al.  SCOP: a structural classification of proteins database , 1998, Nucleic Acids Res..

[50]  Yoav Freund,et al.  A Short Introduction to Boosting , 1999 .

[51]  Chuen-Der Huang,et al.  Hierarchical learning architecture with automatic feature selection for multiclass protein fold classification , 2003, IEEE Transactions on NanoBioscience.

[52]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[53]  Xiaoyong Zou,et al.  Using pseudo-amino acid composition and support vector machine to predict protein structural class. , 2006, Journal of theoretical biology.

[54]  Chuan Yi Tang,et al.  Feature Selection and Combination Criteria for Improving Accuracy in Protein Structure Prediction , 2007, IEEE Transactions on NanoBioscience.

[55]  K. Chou,et al.  Prediction of protein structural classes. , 1995, Critical reviews in biochemistry and molecular biology.

[56]  Azadeh Shakery,et al.  Protein Fold Pattern Recognition Using Bayesian Ensemble of RBF Neural Networks , 2009, 2009 International Conference of Soft Computing and Pattern Recognition.

[57]  Chandan K. Reddy,et al.  Boosting Methods for Protein Fold Recognition: An Empirical Comparison , 2008, 2008 IEEE International Conference on Bioinformatics and Biomedicine.

[58]  Tony R. Martinez,et al.  Decision Tree Ensemble: Small Heterogeneous Is Better Than Large Homogeneous , 2008, 2008 Seventh International Conference on Machine Learning and Applications.

[59]  Jonathan M. Garibaldi,et al.  Supervised machine learning algorithms for protein structure classification , 2009, Comput. Biol. Chem..

[60]  Guido Bologna,et al.  A comparison study on protein fold recognition , 2002, Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02..

[61]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[62]  Frederick Livingston,et al.  Implementation of Breiman's Random Forest Machine Learning Algorithm , 2005 .