Ensemble classifier for protein fold pattern recognition

MOTIVATION Prediction of protein folding patterns is one level deeper than that of protein structural classes, and hence is much more complicated and difficult. To deal with such a challenging problem, the ensemble classifier was introduced. It was formed by a set of basic classifiers, with each trained in different parameter systems, such as predicted secondary structure, hydrophobicity, van der Waals volume, polarity, polarizability, as well as different dimensions of pseudo-amino acid composition, which were extracted from a training dataset. The operation engine for the constituent individual classifiers was OET-KNN (optimized evidence-theoretic k-nearest neighbors) rule. Their outcomes were combined through a weighted voting to give a final determination for classifying a query protein. The recognition was to find the true fold among the 27 possible patterns. RESULTS The overall success rate thus obtained was 62% for a testing dataset where most of the proteins have <25% sequence identity with the proteins used in training the classifier. Such a rate is 6-21% higher than the corresponding rates obtained by various existing NN (neural networks) and SVM (support vector machines) approaches, implying that the ensemble classifier is very promising and might become a useful vehicle in protein science, as well as proteomics and bioinformatics. AVAILABILITY The ensemble classifier, called PFP-Pred, is available as a web-server at http://202.120.37.186/bioinf/fold/PFP-Pred.htm for public usage.

[1]  Tim J. P. Hubbard,et al.  SCOP database in 2004: refinements integrate structure and sequence family data , 2004, Nucleic Acids Res..

[2]  I. Muchnik,et al.  Recognition of a protein fold in the context of the Structural Classification of Proteins (SCOP) classification. , 1999, Proteins.

[3]  C. Zhang,et al.  Predicting protein folding types by distance functions that make allowances for amino acid interactions. , 1994, The Journal of biological chemistry.

[4]  K Nishikawa,et al.  The folding type of a protein is relevant to the amino acid composition. , 1986, Journal of biochemistry.

[5]  Chris Sander,et al.  Protein folds and families: sequence and structure alignments , 1999, Nucleic Acids Res..

[6]  Thierry Denoeux,et al.  An evidence-theoretic k-NN rule with parameter optimization , 1998, IEEE Trans. Syst. Man Cybern. Part C.

[7]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[8]  Glenn Shafer,et al.  A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.

[9]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[10]  G M Maggiora,et al.  Domain structural class prediction. , 1998, Protein engineering.

[11]  Kuo-Chen Chou,et al.  Energetic approach to the packing of α-helices. II: General treatment of nonequivalent and nonregular helices , 1984 .

[12]  Guo-Ping Zhou,et al.  An Intriguing Controversy over Protein Structural Class Prediction , 1998, Journal of protein chemistry.

[13]  H. Scheraga,et al.  Structure of beta-sheets. Origin of the right-handed twist and of the increased stability of antiparallel over parallel sheets. , 1982, Journal of molecular biology.

[14]  K. Chou,et al.  A key driving force in determination of protein structural classes. , 1999, Biochemical and biophysical research communications.

[15]  K. Chou,et al.  Prediction of protein structural classes. , 1995, Critical reviews in biochemistry and molecular biology.

[16]  K. Chou Structural bioinformatics and its impact to biomedical science. , 2004, Current medicinal chemistry.

[17]  Kuo-Chen Chou,et al.  Prediction of Membrane Protein Types by Incorporating Amphipathic Effects , 2005, J. Chem. Inf. Model..

[18]  Kuo-Chen Chou,et al.  Energetic approach to the folding of α/β barrels , 1991 .

[19]  K. Chou A novel approach to predicting protein structural classes in a (20–1)‐D amino acid composition space , 1995, Proteins.

[20]  I. Muchnik,et al.  Recognition of a protein fold in the context of the SCOP classification , 1999 .

[21]  Kuo-Chen Chou,et al.  Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes , 2005, Bioinform..

[22]  C. Tanford Contribution of Hydrophobic Interactions to the Stability of the Globular Conformation of Proteins , 1962 .

[23]  Chin-Teng Lin,et al.  Recognition of Structure Classification of Protein Folding by NN and SVM Hierarchical Learning Architecture , 2003, ICANN.

[24]  Yu-Dong Cai,et al.  Is it a paradox or misinterpretation? , 2001, Proteins.

[25]  Thierry Denoeux,et al.  A k-nearest neighbor classification rule based on Dempster-Shafer theory , 1995, IEEE Trans. Syst. Man Cybern..

[26]  G M Maggiora,et al.  Disposition of amphiphilic helices in heteropolar environments , 1997, Proteins.

[27]  O. Ptitsyn,et al.  Why do globular proteins fit the limited set of folding patterns? , 1987, Progress in biophysics and molecular biology.

[28]  Chris H. Q. Ding,et al.  Multi-class protein fold recognition using support vector machines and neural networks , 2001, Bioinform..

[29]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[30]  K C Chou,et al.  Energetic approach to the folding of alpha/beta barrels. , 1991, Proteins.

[31]  C. Zhang,et al.  A joint prediction of the folding types of 1490 human proteins from their genetic codons. , 1993, Journal of theoretical biology.

[32]  K. R. Woods,et al.  Prediction of protein antigenic determinants from amino acid sequences. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[33]  Kuo-Chen Chou,et al.  Energetics of interactions of regular structural elements in proteins , 1990 .

[34]  Kuo-Chen Chou,et al.  Using optimized evidence-theoretic K-nearest neighbor classifier and pseudo-amino acid composition to predict membrane protein types. , 2005, Biochemical and biophysical research communications.

[35]  Guo-Ping Zhou,et al.  Subcellular location prediction of apoptosis proteins , 2002, Proteins.

[36]  I. Muchnik,et al.  Prediction of protein folding class using global description of amino acid sequence. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[37]  G P Zhou,et al.  Some insights into protein structural class prediction , 2001, Proteins.