Using stacked generalization to predict membrane protein types based on pseudo-amino acid composition.

Membrane proteins are vitally important for many biological processes and have become an attractive target for both basic research and drug design. Knowledge of membrane protein types often provides useful clues in deducing the functions of uncharacterized membrane proteins. With the unprecedented increasing of newly found protein sequences in the post-genomic era, it is highly demanded to develop an automated method for fast and accurately identifying the types of membrane proteins according to their amino acid sequences. Although quite a few identifiers have been developed in this regard through various approaches, such as covariant discriminant (CD), support vector machine (SVM), artificial neural network (ANN), and K-nearest neighbor (KNN), classifier the way they operate the identification is basically individual. As is well known, wise persons usually take into account the opinions from several experts rather than rely on only one when they are making critical decisions. Likewise, a sophisticated identifier should be trained by several different modes. In view of this, based on the frame of pseudo-amino acid that can incorporate a considerable amount of sequence-order effects, a novel approach called "stacked generalization" or "stacking" has been introduced. Unlike the "bagging" and "boosting" approaches which only combine the classifiers of a same type, the stacking approach can combine several different types of classifiers through a meta-classifier to maximize the generalization accuracy. The results thus obtained were very encouraging. It is anticipated that the stacking approach may also hold a high potential to improve the identification quality for, among many other protein attributes, subcellular location, enzyme family class, protease type, and protein-protein interaction type. The stacked generalization classifier is available as a web-server named "SG-MPt_Pred" at: http://202.120.37.186/bioinf/wangsq/service.htm.

[1]  Kuo-Chen Chou,et al.  Using supervised fuzzy clustering to predict protein structural classes. , 2005, Biochemical and biophysical research communications.

[2]  Yongsheng Ding,et al.  An application of gene comparative image for predicting the effect on replication ratio by HBV virus gene missense mutation. , 2005, Journal of theoretical biology.

[3]  Gert Lubec,et al.  Searching for hypothetical proteins: Theory and practice based upon original data and literature , 2005, Progress in Neurobiology.

[4]  P. Aloy,et al.  Relation between amino acid composition and cellular location of proteins. , 1997, Journal of molecular biology.

[5]  Z. Huang,et al.  Using cellular automata images and pseudo amino acid composition to predict protein subcellular location , 2005, Amino Acids.

[6]  Kuo-Chen Chou,et al.  Correlations of amino acids in proteins , 2003, Peptides.

[7]  M. Wang,et al.  Weighted-support vector machines for predicting membrane protein types based on pseudo-amino acid composition. , 2004, Protein engineering, design & selection : PEDS.

[8]  K. Chou,et al.  Prediction of protein signal sequences and their cleavage sites by statistical rulers. , 2005, Biochemical and biophysical research communications.

[9]  Kuo-Chen Chou,et al.  Using GO-PseAA predictor to identify membrane proteins and their types. , 2005, Biochemical and biophysical research communications.

[10]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[11]  Guo-Ping Zhou,et al.  An Intriguing Controversy over Protein Structural Class Prediction , 1998, Journal of protein chemistry.

[12]  G P Zhou,et al.  Some insights into protein structural class prediction , 2001, Proteins.

[13]  Lin He,et al.  Application of Pseudo Amino Acid Composition for Predicting Protein Subcellular Location: Stochastic Signal Processing Approach , 2003, Journal of protein chemistry.

[14]  Z. Feng,et al.  Prediction of the subcellular location of prokaryotic proteins based on a new representation of the amino acid composition. , 2001, Biopolymers.

[15]  Z. Huang,et al.  Using complexity measure factor to predict protein subcellular location , 2005, Amino Acids.

[16]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[17]  K. Chou A novel approach to predicting protein structural classes in a (20–1)‐D amino acid composition space , 1995, Proteins.

[18]  K. Chou,et al.  Artificial Neural Network Model for Predicting Membrane Protein Types , 2001, Journal of biomolecular structure & dynamics.

[19]  Zhi-Ping Feng,et al.  An overview on predicting the subcellular location of a protein , 2002, Silico Biol..

[20]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[21]  B. Rost,et al.  Transmembrane helices predicted at 95% accuracy , 1995, Protein science : a publication of the Protein Society.

[22]  Zhi-Ping Feng,et al.  Prediction of protein structural class by amino acid and polypeptide composition. , 2002, European journal of biochemistry.

[23]  M. Wang,et al.  Low-frequency Fourier spectrum for predicting membrane protein types. , 2005, Biochemical and biophysical research communications.

[24]  M. Resh,et al.  Myristylation and palmitylation of Src family members: The fats of the matter , 1994, Cell.

[25]  K. Chou,et al.  Support vector machines for predicting membrane protein types by using functional domain composition. , 2003, Biophysical journal.

[26]  Guo-Ping Zhou,et al.  Subcellular location prediction of apoptosis proteins , 2002, Proteins.

[27]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[28]  K. Chou Progress in protein structural class prediction and its impact to bioinformatics and proteomics. , 2005, Current protein & peptide science.

[29]  H. Lodish Molecular Cell Biology , 1986 .

[30]  K. Chou,et al.  Prediction of protein structural classes. , 1995, Critical reviews in biochemistry and molecular biology.

[31]  P. Mahalanobis On the generalized distance in statistics , 1936 .

[32]  Kuo-Chen Chou,et al.  Boosting classifier for predicting protein domain structural class. , 2005, Biochemical and biophysical research communications.

[33]  Kuo-Chen Chou,et al.  Using optimized evidence-theoretic K-nearest neighbor classifier and pseudo-amino acid composition to predict membrane protein types. , 2005, Biochemical and biophysical research communications.

[34]  S. Sathiya Keerthi,et al.  Improvements to Platt's SMO Algorithm for SVM Classifier Design , 2001, Neural Computation.

[35]  K. Chou,et al.  Protein subcellular location prediction. , 1999, Protein engineering.

[36]  K. Chou,et al.  Prediction of membrane protein types and subcellular locations , 1999, Proteins.

[37]  Kuo-Chen Chou,et al.  Using pseudo amino acid composition to predict protein structural classes: Approached with complexity measure factor , 2006, J. Comput. Chem..

[38]  Kuo-Chen Chou,et al.  Predicting protein subnuclear location with optimized evidence-theoretic K-nearest classifier and pseudo amino acid composition. , 2005, Biochemical and biophysical research communications.

[39]  Meng Wang,et al.  SLLE for predicting membrane protein types. , 2005, Journal of theoretical biology.

[40]  C. Zhang,et al.  Predicting protein folding types by distance functions that make allowances for amino acid interactions. , 1994, The Journal of biological chemistry.

[41]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[42]  Z. Huang,et al.  Using pseudo amino acid composition to predict protein subcellular location: Approached with Lyapunov index, Bessel function, and Chebyshev filter , 2005, Amino Acids.

[43]  K. Chou,et al.  Using LogitBoost classifier to predict protein structural classes. , 2006, Journal of theoretical biology.