A mixture of physicochemical and evolutionary-based feature extraction approaches for protein fold recognition

Recent advancement in the pattern recognition field stimulates enormous interest in Protein Fold Recognition (PFR). PFR is considered as a crucial step towards protein structure prediction and drug design. Despite all the recent achievements, the PFR still remains as an unsolved issue in biological science and its prediction accuracy still remains unsatisfactory. Furthermore, the impact of using a wide range of physicochemical-based attributes on the PFR has not been adequately explored. In this study, we propose a novel mixture of physicochemical and evolutionary-based feature extraction methods based on the concepts of segmented distribution and density. We also explore the impact of 55 different physicochemical-based attributes on the PFR. Our results show that by providing more local discriminatory information as well as obtaining benefit from both physicochemical and evolutionary-based features simultaneously, we can enhance the protein fold prediction accuracy up to 5% better than previously reported results found in the literature.

[1]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[2]  Y-h. Taguchi,et al.  Application of amino acid occurrence for discriminating different folding types of globular proteins , 2007, BMC Bioinformatics.

[3]  R Cowan,et al.  Hydrophobicity indices for amino acid residues as determined by high-performance liquid chromatography. , 1990, Peptide research.

[4]  Abdollah Dehzangi,et al.  Ensemble of Diversely Trained Support Vector Machines for Protein Fold Recognition , 2013, ACIIDS.

[5]  P. Deschavanne,et al.  Enhanced protein fold recognition using a structural alphabet , 2009, Proteins.

[6]  Nicholas Dennis,et al.  The biochemical genetics of man , 1978 .

[7]  A. Komoriya,et al.  Local interactions as a structure determinant for protein molecules: III. , 1979, Biochimica et biophysica acta.

[8]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[9]  M. Vihinen,et al.  Accuracy of protein flexibility predictions , 1994, Proteins.

[10]  Somnuk Phon-Amnuaisuk,et al.  Enhancing Protein Fold Prediction Accuracy Using an Ensemble of Different Classifiers , 2009, Aust. J. Intell. Inf. Process. Syst..

[11]  J. Janin,et al.  Surface and inside volumes in globular proteins , 1979, Nature.

[12]  Abdollah Dehzangi,et al.  Solving protein fold prediction problem using fusion of heterogeneous classifiers , 2011 .

[13]  Katarzyna Stapor,et al.  A hybrid discriminative/generative approach to protein fold recognition , 2012, Neurocomputing.

[14]  Hampapathalu A. Nagarajaram,et al.  Support Vector Machine-based classification of protein folds using the structural properties of amino acid residues and amino acid residue pairs , 2007, Bioinform..

[15]  Tun-Wen Pai,et al.  Protein crystallization prediction with AdaBoost , 2013, Int. J. Data Min. Bioinform..

[16]  R. Perlès The Biochemical Genetics of Man, Brock (D.J.H.), Mayo (O.). Academic Press Inc., 55 Quai des Grands Augustins, Paris (1972), £ 9.80 , 1973 .

[17]  P. Karplus,et al.  Prediction of chain flexibility in proteins , 1985, Naturwissenschaften.

[18]  Loris Nanni,et al.  High performance set of PseAAC and sequence based descriptors for protein classification. , 2010, Journal of theoretical biology.

[19]  N.R. Pal,et al.  Prediction of Protein Folds: Extraction of New Features, Dimensionality Reduction, and Fusion of Heterogeneous Classifiers , 2009, IEEE Transactions on NanoBioscience.

[20]  M. Michael Gromiha,et al.  A Statistical Model for Predicting Protein Folding Rates from Amino Acid Sequence with Structural Class Information , 2005, J. Chem. Inf. Model..

[21]  Kuldip K. Paliwal,et al.  Proposing a highly accurate protein structural class predictor using segmentation-based features , 2014, BMC Genomics.

[22]  Joe Faith,et al.  Predicting functional residues of protein sequence alignments as a feature selection task , 2011, Int. J. Data Min. Bioinform..

[23]  M. Michael Gromiha,et al.  Multiple Contact Network Is a Key Determinant to Protein Folding Rates , 2009, J. Chem. Inf. Model..

[24]  Shuigeng Zhou,et al.  A new taxonomy-based protein fold recognition approach based on autocross-covariance transformation , 2009, Bioinform..

[25]  Jianyi Yang,et al.  Improving taxonomy‐based protein fold recognition by using global and local features , 2011, Proteins.

[26]  M. Charton,et al.  The structural dependence of amino acid hydrophobicity parameters. , 1982, Journal of theoretical biology.

[27]  H. Guy Amino acid side-chain partition energies and distribution of residues in soluble proteins. , 1985, Biophysical journal.

[28]  Roger L. Lundblad,et al.  Handbook of Biochemistry and Molecular Biology, Fourth Edition , 2010 .

[29]  James G. Lyons,et al.  A feature extraction technique using bi-gram probabilities of position specific scoring matrix for protein fold recognition. , 2013, Journal of theoretical biology.

[30]  D. Eisenberg,et al.  Analysis of membrane and surface protein sequences with the hydrophobic moment plot. , 1984, Journal of molecular biology.

[31]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[32]  Lukasz A. Kurgan,et al.  SCPRED: Accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences , 2008, BMC Bioinformatics.

[33]  R. Jernigan,et al.  Estimation of effective interresidue contact energies from protein crystal structures: quasi-chemical approximation , 1985 .

[34]  J. M. Zimmerman,et al.  The characterization of amino acid sequences in proteins by statistical methods. , 1968, Journal of theoretical biology.

[35]  Ian Witten,et al.  Data Mining , 2000 .

[36]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[37]  Minoru Kanehisa,et al.  AAindex: amino acid index database, progress report 2008 , 2007, Nucleic Acids Res..

[38]  P. Ponnuswamy,et al.  Hydrophobic packing and spatial arrangement of amino acid residues in globular proteins. , 1980, Biochimica et biophysica acta.

[39]  S. Rackovsky,et al.  Empirical Studies of Hydrophobicity. 1. Effect of Protein Size on the Hydrophobic Behavior of Amino Acids , 1980 .

[40]  Xian-Ming Pan,et al.  Accurate Prediction of Protein Structural Class , 2012, PloS one.

[41]  Mohammad Saraee,et al.  Protein contact map prediction using committee machine approach , 2013, Int. J. Data Min. Bioinform..

[42]  R. Grantham Amino Acid Difference Formula to Help Explain Protein Evolution , 1974, Science.

[43]  Kuldip K. Paliwal,et al.  Enhancing Protein Fold Prediction Accuracy Using Evolutionary and Structural Features , 2013, PRIB.

[44]  P K Ponnuswamy,et al.  Dynamics of amino acid residues in globular proteins. , 2009, International journal of peptide and protein research.

[45]  K. Chou,et al.  Predicting protein fold pattern with functional domain and sequential evolution information. , 2009, Journal of theoretical biology.

[46]  G. Rose,et al.  Hydrophobicity of amino acid residues in globular proteins. , 1985, Science.

[47]  Deepak Kolippakkam,et al.  APDbase: Amino acid Physicochemical properties Database , 2005, Bioinformation.

[48]  Kuldip K. Paliwal,et al.  A strategy to select suitable physicochemical attributes of amino acids for protein fold recognition , 2013, BMC Bioinformatics.

[49]  G. Ya. Wiederschain,et al.  Handbook of Biochemistry and Molecular Biology , 2010, Biochemistry (Moscow).

[50]  D. Brock,et al.  The biochemical genetics of man , 1978 .

[51]  Babak Nadjar Araabi,et al.  A protein fold classifier formed by fusing different modes of pseudo amino acid composition via PSSM , 2011, Comput. Biol. Chem..

[52]  C. Mant,et al.  Prediction of peptide retention times in reversed-phase high-performance liquid chromatography II. Correlation of observed and predicted peptide retention times factors and influencing the retention times of peptides , 1986 .

[53]  T. Steitz,et al.  Identifying nonpolar transbilayer helices in amino acid sequences of membrane proteins. , 1986, Annual review of biophysics and biophysical chemistry.

[54]  Theodoros Damoulas,et al.  Probabilistic multi-class multi-kernel learning: on protein fold recognition and remote homology detection , 2008, Bioinform..

[55]  M J Sippl,et al.  Structure-derived hydrophobic potential. Hydrophobic potential derived from X-ray structures of globular proteins is able to identify native folds. , 1992, Journal of molecular biology.

[56]  Abdollah Dehzangi,et al.  Fold prediction problem: the application of new physical and physicochemical-based features. , 2011, Protein and peptide letters.

[57]  Niu Jing-chang A Feature Extraction Technique for DS Signals , 2010 .

[58]  Lukasz A. Kurgan,et al.  PFRES: protein fold classification by using evolutionary information and predicted secondary structure , 2007, Bioinform..

[59]  Abdollah Dehzangi,et al.  Using Random Forest for Protein Fold Prediction Problem: An Empirical Study , 2010, J. Inf. Sci. Eng..

[60]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[61]  Chris H. Q. Ding,et al.  Multi-class protein fold recognition using support vector machines and neural networks , 2001, Bioinform..

[62]  Abdollah Dehzangi,et al.  A Combination of Feature Extraction Methods with an Ensemble of Different Classifiers for Protein Structural Class Prediction Problem , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[63]  P Argos,et al.  A conformational preference parameter to predict helices in integral membrane proteins. , 1986, Biochimica et biophysica acta.

[64]  Jonathan M. Garibaldi,et al.  Supervised machine learning algorithms for protein structure classification , 2009, Comput. Biol. Chem..

[65]  Vladimir Cherkassky,et al.  The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.

[66]  Xiaoqi Zheng,et al.  Accurate prediction of protein structural class using auto covariance transformation of PSI-BLAST profiles , 2011, Amino Acids.

[67]  A. Komoriya,et al.  Local interactions as a structure determinant for protein molecules: II. , 1979, Biochimica et biophysica acta.

[68]  Jagath C. Rajapakse,et al.  Prediction of Protein Secondary Structure with two-stage multi-class SVMs , 2007, Int. J. Data Min. Bioinform..

[69]  K. R. Woods,et al.  Prediction of protein antigenic determinants from amino acid sequences. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[70]  P. Ponnuswamy,et al.  Hydrophobic character of amino acid residues in globular proteins , 1978, Nature.

[71]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[72]  Shuigeng Zhou,et al.  An improved genetic algorithm for statistical potential function design and protein structure prediction , 2012, Int. J. Data Min. Bioinform..

[73]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[74]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[75]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[76]  L. Kier,et al.  Amino acid side chain parameters for correlation studies in biology and pharmacology. , 2009, International journal of peptide and protein research.

[77]  P M Cullis,et al.  Affinities of amino acid side chains for solvent water. , 1981, Biochemistry.

[78]  Roger L. Lundblad,et al.  Handbook of Biochemistry and Molecular Biology, Fifth Edition , 2010 .

[79]  P. Y. Chou,et al.  Empirical predictions of protein conformation. , 1978, Annual review of biochemistry.

[80]  Chengqi Zhang,et al.  Margin-based ensemble classifier for protein fold recognition , 2011, Expert Syst. Appl..

[81]  Saeed Jalili,et al.  Protein fold recognition with a two-layer method based on SVM-SA, WP-NN and C4.5 (TLM-SNC) , 2013, Int. J. Data Min. Bioinform..

[82]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[83]  Somnuk Phon-Amnuaisuk,et al.  Using Rotation Forest for Protein Fold Prediction Problem: An Empirical Study , 2010, EvoBIO.

[84]  Somnuk Phon-Amnuaisuk,et al.  Protein Fold Prediction Problem Using Ensemble of Classifiers , 2009, ICONIP.