Protein Fold Recognition Using Segmentation-Based Feature Extraction Model

Protein Fold recognition (PFR) is considered as an important step towards protein structure prediction. It also provides significant information about general functionality of a given protein. Despite all the efforts have been made, PFR still remains unsolved. It is shown that appropriately extracted features from the physicochemical-based attributes of the amino acids plays crucial role to address this problem. In this study, we explore 55 different physicochemical-based attributes using two novel feature extraction methods namely segmented distribution and segmented density. Then, by proposing an ensemble of different classifiers based on the AdaBoost.M1 and Support Vector Machine (SVM) classifiers which are diversely trained on different combinations of features extracted from these attributes, we outperform similar studies found in the literature for over 2% for the PFR task.

[1]  Ian Witten,et al.  Data Mining , 2000 .

[2]  Chris H. Q. Ding,et al.  Multi-class protein fold recognition using support vector machines and neural networks , 2001, Bioinform..

[3]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[4]  Y-h. Taguchi,et al.  Application of amino acid occurrence for discriminating different folding types of globular proteins , 2007, BMC Bioinformatics.

[5]  Somnuk Phon-Amnuaisuk,et al.  Protein Fold Prediction Problem Using Ensemble of Classifiers , 2009, ICONIP.

[6]  Babak Nadjar Araabi,et al.  A protein fold classifier formed by fusing different modes of pseudo amino acid composition via PSSM , 2011, Comput. Biol. Chem..

[7]  M. Gromiha,et al.  Important amino acid properties for enhanced thermostability from mesophilic to thermophilic proteins. , 1999, Biophysical chemistry.

[8]  Abdollah Dehzangi,et al.  Fold prediction problem: the application of new physical and physicochemical-based features. , 2011, Protein and peptide letters.

[9]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[10]  Lukasz A. Kurgan,et al.  Secondary structure-based assignment of the protein structural classes , 2008, Amino Acids.

[11]  K. Chou,et al.  Predicting protein fold pattern with functional domain and sequential evolution information. , 2009, Journal of theoretical biology.

[12]  Kuo-Chen Chou,et al.  Ensemble classifier for protein fold pattern recognition , 2006, Bioinform..

[13]  Deepak Kolippakkam,et al.  APDbase: Amino acid Physicochemical properties Database , 2005, Bioinformation.

[14]  Chandan K. Reddy,et al.  Boosting Methods for Protein Fold Recognition: An Empirical Comparison , 2008, 2008 IEEE International Conference on Bioinformatics and Biomedicine.

[15]  Jianyi Yang,et al.  Improving taxonomy‐based protein fold recognition by using global and local features , 2011, Proteins.

[16]  Lukasz A. Kurgan,et al.  PFRES: protein fold classification by using evolutionary information and predicted secondary structure , 2007, Bioinform..

[17]  Abdollah Dehzangi,et al.  Using Random Forest for Protein Fold Prediction Problem: An Empirical Study , 2010, J. Inf. Sci. Eng..

[18]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[19]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[20]  Nick Pacf,et al.  Protein and peptide letters: editors Ben Dunn and Laurence Pearl, Bentham Science Publishers B.V., $60.00 (individual); $155.00 (institutional) , 1995 .