Prediction protein structural classes with pseudo-amino acid composition: approximate entropy and hydrophobicity pattern.

Compared with the conventional amino acid (AA) composition, the pseudo-amino acid (PseAA) composition as originally introduced for protein subcellular location prediction can incorporate much more information of a protein sequence, so as to remarkably enhance the power of using a discrete model to predict various attributes of a protein. In this study, based on the concept of PseAA composition, the approximate entropy and hydrophobicity pattern of a protein sequence are used to characterize the PseAA components. Also, the immune genetic algorithm (IGA) is applied to search the optimal weight factors in generating the PseAA composition. Thus, for a given protein sequence sample, a 27-D (dimensional) PseAA composition is generated as its descriptor. The fuzzy K nearest neighbors (FKNN) classifier is adopted as the prediction engine. The results thus obtained in predicting protein structural classification are quite encouraging, indicating that the current approach may also be used to improve the prediction quality of other protein attributes, or at least can play a complimentary role to the existing methods in the relevant areas. Our algorithm is written in Matlab that is available by contacting the corresponding author.

[1]  J. M. Zimmerman,et al.  The characterization of amino acid sequences in proteins by statistical methods. , 1968, Journal of theoretical biology.

[2]  V. Lim Algorithms for prediction of α-helical and β-structural regions in globular proteins , 1974 .

[3]  K. Chou,et al.  Hum-mPLoc: an ensemble classifier for large-scale human protein subcellular location prediction by incorporating samples with multiple sites. , 2007, Biochemical and biophysical research communications.

[4]  I. Cosic Macromolecular bioactivity: is it resonant interaction between macromolecules?-theory and applications , 1994, IEEE Transactions on Biomedical Engineering.

[5]  Xiaoyong Zou,et al.  Using pseudo-amino acid composition and support vector machine to predict protein structural class. , 2006, Journal of theoretical biology.

[6]  K. Chou,et al.  Predicting protein structural classes from amino acid composition: application of fuzzy clustering. , 1995, Protein engineering.

[7]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[8]  Yanda Li,et al.  Prediction of protein submitochondria locations by hybridizing pseudo-amino acid composition with various physicochemical features of segmented sequence , 2006, BMC Bioinformatics.

[9]  Michael G Sadovsky,et al.  The method to compare nucleotide sequences based on the minimum entropy principle , 2003, Bulletin of mathematical biology.

[10]  Scott Dick,et al.  Classifier ensembles for protein structural class prediction with varying homology. , 2006, Biochemical and biophysical research communications.

[11]  G M Maggiora,et al.  A heuristic approach to predicting the tertiary structure of bovine somatotropin. , 1991, Biochemistry.

[12]  K. R. Woods,et al.  Prediction of protein antigenic determinants from amino acid sequences. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Kuo-Chen Chou,et al.  Using stacked generalization to predict membrane protein types based on pseudo-amino acid composition. , 2006, Journal of theoretical biology.

[14]  K. Chou,et al.  A key driving force in determination of protein structural classes. , 1999, Biochemical and biophysical research communications.

[15]  C. Zhang,et al.  Predicting protein folding types by distance functions that make allowances for amino acid interactions. , 1994, The Journal of biological chemistry.

[16]  Kuo-Chen Chou,et al.  Predicting protein structural class by functional domain composition. , 2004, Biochemical and biophysical research communications.

[17]  Peixiang Cai,et al.  Predicting protein structural class with pseudo-amino acid composition and support vector machine fusion network. , 2006, Analytical biochemistry.

[18]  M. Levitt,et al.  Conformation of amino acid side-chains in proteins. , 1978, Journal of molecular biology.

[19]  Howard Leung,et al.  Prediction of membrane protein types from sequences and position-specific scoring matrices. , 2007, Journal of theoretical biology.

[20]  Kuo-Chen Chou,et al.  Using optimized evidence-theoretic K-nearest neighbor classifier and pseudo-amino acid composition to predict membrane protein types. , 2005, Biochemical and biophysical research communications.

[21]  C. DeLisi,et al.  Prediction of protein structural class from the amino acid sequence , 1986, Biopolymers.

[22]  P. Klein,et al.  Prediction of protein structural class by discriminant analysis. , 1986, Biochimica et biophysica acta.

[23]  Kuo-Chen Chou,et al.  Fuzzy KNN for predicting membrane protein types from pseudo-amino acid composition. , 2006, Journal of theoretical biology.

[24]  Z. Huang,et al.  Using cellular automata images and pseudo amino acid composition to predict protein subcellular location , 2005, Amino Acids.

[25]  Guo-Ping Zhou,et al.  An Intriguing Controversy over Protein Structural Class Prediction , 1998, Journal of protein chemistry.

[26]  Jishou Ruan,et al.  Novel scales based on hydrophobicity indices for secondary protein structure. , 2007, Journal of theoretical biology.

[27]  G P Zhou,et al.  Some insights into protein structural class prediction , 2001, Proteins.

[28]  G. Rose,et al.  Hydrophobicity of amino acid residues in globular proteins. , 1985, Science.

[29]  Kuo-Chen Chou,et al.  Correlations of amino acids in proteins , 2003, Peptides.

[30]  J M Chandonia,et al.  Neural networks for secondary structure and structural class predictions , 1995, Protein science : a publication of the Protein Society.

[31]  Lukasz A. Kurgan,et al.  Prediction of structural classes for protein sequences and domains - Impact of prediction algorithms, sequence representation and homology, and test procedures on accuracy , 2006, Pattern Recognit..

[32]  G Deléage,et al.  An algorithm for protein secondary structure prediction based on class prediction. , 1987, Protein engineering.

[33]  M. Wang,et al.  Weighted-support vector machines for predicting membrane protein types based on pseudo-amino acid composition. , 2004, Protein engineering, design & selection : PEDS.

[34]  P. Y. Chou,et al.  Prediction of Protein Structural Classes from Amino Acid Compositions , 1989 .

[35]  Da-Peng Li,et al.  Amino Acid Principal Component Analysis (AAPCA) and its Applications in Protein Structural Class Prediction , 2006, Journal of biomolecular structure & dynamics.

[36]  Ying-Li Chen,et al.  Prediction of the subcellular location of apoptosis proteins. , 2007, Journal of theoretical biology.

[37]  G. Fasman Prediction of Protein Structure and the Principles of Protein Conformation , 2012, Springer US.

[38]  Qianzhong Li,et al.  Using pseudo amino acid composition to predict protein structural class: Approached by incorporating 400 dipeptide components , 2007, J. Comput. Chem..

[39]  Ying Huang,et al.  Prediction of protein subcellular locations using fuzzy k-NN method , 2004, Bioinform..

[40]  Kuo-Chen Chou,et al.  Large‐scale plant protein subcellular location prediction , 2007, Journal of cellular biochemistry.

[41]  Zheng Yuan,et al.  How good is prediction of protein structural class by the component‐coupled method? , 2000, Proteins.

[42]  Shiow-Fen Hwang,et al.  Accurate prediction of enzyme subfamily class using an adaptive fuzzy k-nearest neighbor method , 2007, Biosyst..

[43]  Sukanta Mondal,et al.  Pseudo amino acid composition and multi-class support vector machines approach for conotoxin superfamily classification. , 2006, Journal of theoretical biology.

[44]  L. Kier,et al.  Amino acid side chain parameters for correlation studies in biology and pharmacology. , 2009, International journal of peptide and protein research.

[45]  K. Chou A novel approach to predicting protein structural classes in a (20–1)‐D amino acid composition space , 1995, Proteins.

[46]  Guo-Ping Zhou,et al.  Subcellular location prediction of apoptosis proteins , 2002, Proteins.

[47]  Yongsheng Ding,et al.  Prediction of protein subcellular location using hydrophobic patterns of amino acid sequence , 2006, Comput. Biol. Chem..

[48]  H.-B. Shen,et al.  Euk-PLoc: an ensemble classifier for large-scale eukaryotic protein subcellular location prediction , 2007, Amino Acids.

[49]  C. Chothia,et al.  Structural patterns in globular proteins , 1976, Nature.

[50]  S.-W. Zhang,et al.  Prediction of protein homo-oligomer types by pseudo amino acid composition: Approached with an improved feature extraction and Naive Bayes Feature Fusion , 2006, Amino Acids.

[51]  K. Chou Progress in protein structural class prediction and its impact to bioinformatics and proteomics. , 2005, Current protein & peptide science.

[52]  K. Chou,et al.  Gpos-PLoc: an ensemble classifier for predicting subcellular localization of Gram-positive bacterial proteins. , 2007, Protein engineering, design & selection : PEDS.

[53]  Jiang Wang,et al.  Prediction of protein structural class with Rough Sets , 2006, BMC Bioinformatics.

[54]  C. Tanford Contribution of Hydrophobic Interactions to the Stability of the Globular Conformation of Proteins , 1962 .

[55]  O. Ptitsyn,et al.  Why do globular proteins fit the limited set of folding patterns? , 1987, Progress in biophysics and molecular biology.

[56]  Kuo-Chen Chou,et al.  MemType-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM. , 2007, Biochemical and biophysical research communications.

[57]  Ying-Li Chen,et al.  Prediction of apoptosis protein subcellular location using improved hybrid approach and pseudo-amino acid composition. , 2007, Journal of theoretical biology.

[58]  K. Chou,et al.  Using Functional Domain Composition and Support Vector Machines for Prediction of Protein Subcellular Location* , 2002, The Journal of Biological Chemistry.

[59]  K. Chou,et al.  Protein subcellular location prediction. , 1999, Protein engineering.

[60]  K. Chou,et al.  Euk-mPLoc: a fusion classifier for large-scale eukaryotic protein subcellular location prediction by incorporating multiple sites. , 2007, Journal of proteome research.

[61]  P. Argos,et al.  Structural prediction of membrane-bound proteins. , 2005, European journal of biochemistry.

[62]  Kuo-Chen Chou,et al.  Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes , 2005, Bioinform..

[63]  S.-W. Zhang,et al.  Prediction of protein subcellular localization by support vector machines using multi-scale energy and pseudo amino acid composition , 2007, Amino Acids.

[64]  Zhanchao Li,et al.  Using Chou's amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes. , 2007, Journal of theoretical biology.

[65]  Kuo-Chen Chou,et al.  Signal-CF: a subsite-coupled and window-fusing approach for predicting signal peptides. , 2007, Biochemical and biophysical research communications.

[66]  Meng Wang,et al.  Using Fourier Spectrum Analysis and Pseudo Amino Acid Composition for Prediction of Membrane Protein Types , 2005, The protein journal.

[67]  K Nishikawa,et al.  The folding type of a protein is relevant to the amino acid composition. , 1986, Journal of biochemistry.

[68]  J. Janin,et al.  Surface and inside volumes in globular proteins , 1979, Nature.

[69]  K. Chou,et al.  Energy-optimized structure of antifreeze protein and its binding mechanism. , 1992, Journal of molecular biology.

[70]  S M Pincus,et al.  Approximate entropy as a measure of system complexity. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[71]  Kuo-Chen Chou,et al.  Ensemble classifier for protein fold pattern recognition , 2006, Bioinform..

[72]  J. Richman,et al.  Physiological time-series analysis using approximate entropy and sample entropy. , 2000, American journal of physiology. Heart and circulatory physiology.

[73]  Kuo-Chen Chou,et al.  Using pseudo amino acid composition to predict protein structural classes: Approached with complexity measure factor , 2006, J. Comput. Chem..

[74]  Kuo-Chen Chou,et al.  Predicting protein subnuclear location with optimized evidence-theoretic K-nearest classifier and pseudo amino acid composition. , 2005, Biochemical and biophysical research communications.

[75]  K. Chou,et al.  Virus-PLoc: a fusion classifier for predicting the subcellular localization of viral proteins within host and virus-infected cells. , 2007, Biopolymers.

[76]  R Langridge,et al.  Improvements in protein secondary structure prediction by an enhanced neural network. , 1990, Journal of molecular biology.

[77]  Hiroyuki Ogata,et al.  AAindex: Amino Acid Index Database , 1999, Nucleic Acids Res..

[78]  Hao Lin,et al.  Predicting conotoxin superfamily and family by using pseudo amino acid composition and modified Mahalanobis discriminant. , 2007, Biochemical and biophysical research communications.

[79]  D. Connelly,et al.  Cross‐validation of protein structural class prediction using statistical clustering and neural networks , 1993, Protein science : a publication of the Protein Society.

[80]  K. Chou,et al.  Prediction of protein structural classes. , 1995, Critical reviews in biochemistry and molecular biology.

[81]  Kuo-Chen Chou,et al.  Using supervised fuzzy clustering to predict protein structural classes. , 2005, Biochemical and biophysical research communications.

[82]  Tongliang Zhang,et al.  Using pseudo amino acid composition and binary-tree support vector machines to predict protein structural classes , 2007, Amino Acids.

[83]  James M. Keller,et al.  A fuzzy K-nearest neighbor algorithm , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[84]  Zhi-Ping Feng,et al.  Prediction of protein structural class by amino acid and polypeptide composition. , 2002, European journal of biochemistry.

[85]  M. Wang,et al.  Low-frequency Fourier spectrum for predicting membrane protein types. , 2005, Biochemical and biophysical research communications.