Using Chou's pseudo amino acid composition to predict subcellular localization of apoptosis proteins: An approach with immune genetic algorithm-based ensemble classifier

It is crucial to develop powerful tools to predict apoptosis protein locations for rapidly increasing gap between the number of known structural proteins and the number of known sequences in protein databank. In this study, based on the concept of pseudo amino acid (PseAA) composition originally introduced by Chou, a novel approximate entropy (ApEn) based PseAA composition is proposed to represent apoptosis protein sequences. An ensemble classifier is introduced, of which the basic classifier is the FKNN (fuzzy K-nearest neighbor) one, as prediction engine. Each basic classifier is trained in different dimensions of PseAA composition of protein sequences. The immune genetic algorithm (IGA) is used to search the optimal weight factors in generating the PseAA composition for crucial of weight factors in PseAA composition. The results obtained by Jackknife test are quite encouraging, indicating that the proposed method might become a potentially useful tool for protein function, or at least can play a complimentary role to the existing methods in the relevant areas.

[1]  J B Schulz,et al.  Caspases as treatment targets in stroke and neurodegenerative diseases , 1999, Annals of neurology.

[2]  K C Chou,et al.  Prediction of protein structural classes and subcellular locations. , 2000, Current protein & peptide science.

[3]  J. Richman,et al.  Physiological time-series analysis using approximate entropy and sample entropy. , 2000, American journal of physiology. Heart and circulatory physiology.

[4]  Kuo-Chen Chou,et al.  HIVcleave: a web-server for predicting human immunodeficiency virus protease cleavage sites in proteins. , 2008, Analytical biochemistry.

[5]  K. Chou,et al.  Euk-mPLoc: a fusion classifier for large-scale eukaryotic protein subcellular location prediction by incorporating multiple sites. , 2007, Journal of proteome research.

[6]  Kuo-Chen Chou,et al.  Methodology development for predicting subcellular localization and other attributes of proteins , 2007, Expert review of proteomics.

[7]  Zhen-Hui Zhang,et al.  A novel method for apoptosis protein subcellular localization prediction combining encoding based on grouped weight and support vector machine , 2006, FEBS letters.

[8]  Hao Lin,et al.  Predicting conotoxin superfamily and family by using pseudo amino acid composition and modified Mahalanobis discriminant. , 2007, Biochemical and biophysical research communications.

[9]  Kuo-Chen Chou,et al.  Ensemble classifier for protein fold pattern recognition , 2006, Bioinform..

[10]  Kuo-Chen Chou,et al.  Predicting protein subcellular location by fusing multiple classifiers , 2006, Journal of cellular biochemistry.

[11]  K. Chou,et al.  Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms , 2008, Nature Protocols.

[12]  Kuo-Chen Chou,et al.  MemType-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM. , 2007, Biochemical and biophysical research communications.

[13]  S.-W. Zhang,et al.  Prediction of protein subcellular localization by support vector machines using multi-scale energy and pseudo amino acid composition , 2007, Amino Acids.

[14]  Ying Huang,et al.  Prediction of protein subcellular locations using fuzzy k-NN method , 2004, Bioinform..

[15]  Kuo-Chen Chou,et al.  Large‐scale plant protein subcellular location prediction , 2007, Journal of cellular biochemistry.

[16]  Shiow-Fen Hwang,et al.  Accurate prediction of enzyme subfamily class using an adaptive fuzzy k-nearest neighbor method , 2007, Biosyst..

[17]  Zhi-Ping Feng,et al.  An overview on predicting the subcellular location of a protein , 2002, Silico Biol..

[18]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[19]  Bhaskar D. Kulkarni,et al.  Using pseudo amino acid composition to predict protein subnuclear localization: Approached with PSSM , 2007, Pattern Recognit. Lett..

[20]  M. Levitt,et al.  Conformation of amino acid side-chains in proteins. , 1978, Journal of molecular biology.

[21]  K. Chou,et al.  Virus-PLoc: a fusion classifier for predicting the subcellular localization of viral proteins within host and virus-infected cells. , 2007, Biopolymers.

[22]  K. Yamamoto,et al.  Amino acid and vitamin requirements in mammalian cultured cells , 1993, Amino Acids.

[23]  Qianzhong Li,et al.  Using pseudo amino acid composition to predict protein structural class: Approached by incorporating 400 dipeptide components , 2007, J. Comput. Chem..

[24]  Kuo-Chen Chou,et al.  Prediction protein structural classes with pseudo-amino acid composition: approximate entropy and hydrophobicity pattern. , 2008, Journal of theoretical biology.

[25]  Roland Eils,et al.  Predicting protein subcellular locations using hierarchical ensemble of Bayesian classifiers based on Markov chains , 2006, BMC Bioinformatics.

[26]  K Nishikawa,et al.  Discrimination of intracellular and extracellular proteins using amino acid composition and residue-pair frequencies. , 1994, Journal of molecular biology.

[27]  Chun Yan,et al.  Prediction of protein subcellular location using a combined feature of sequence , 2005, FEBS letters.

[28]  K. Chou,et al.  Hum-mPLoc: an ensemble classifier for large-scale human protein subcellular location prediction by incorporating samples with multiple sites. , 2007, Biochemical and biophysical research communications.

[29]  J. Janin,et al.  Surface and inside volumes in globular proteins , 1979, Nature.

[30]  M. Wang,et al.  Low-frequency Fourier spectrum for predicting membrane protein types. , 2005, Biochemical and biophysical research communications.

[31]  J. M. Zimmerman,et al.  The characterization of amino acid sequences in proteins by statistical methods. , 1968, Journal of theoretical biology.

[32]  Minoru Kanehisa,et al.  Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs , 2003, Bioinform..

[33]  K. Chou,et al.  Prediction of protein subcellular locations by GO-FunD-PseAA predictor. , 2004, Biochemical and biophysical research communications.

[34]  I. Cosic Macromolecular bioactivity: is it resonant interaction between macromolecules?-theory and applications , 1994, IEEE Transactions on Biomedical Engineering.

[35]  Kuo-Chen Chou,et al.  Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes , 2005, Bioinform..

[36]  K. Chou,et al.  Prediction of the tertiary structure of a caspase‐9/inhibitor complex , 2000, FEBS letters.

[37]  Scott Dick,et al.  Classifier ensembles for protein structural class prediction with varying homology. , 2006, Biochemical and biophysical research communications.

[38]  Kuo-Chen Chou,et al.  Signal-CF: a subsite-coupled and window-fusing approach for predicting signal peptides. , 2007, Biochemical and biophysical research communications.

[39]  James M. Keller,et al.  A fuzzy K-nearest neighbor algorithm , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[40]  Kuo-Chen Chou,et al.  Nuc-PLoc: a new web-server for predicting protein subnuclear localization by fusing PseAA composition and PsePSSM. , 2007, Protein engineering, design & selection : PEDS.

[41]  Yongsheng Ding,et al.  Prediction of protein subcellular location using hydrophobic patterns of amino acid sequence , 2006, Comput. Biol. Chem..

[42]  K. Chou,et al.  Using Functional Domain Composition and Support Vector Machines for Prediction of Protein Subcellular Location* , 2002, The Journal of Biological Chemistry.

[43]  K. Chou Structural bioinformatics and its impact to biomedical science. , 2004, Current medicinal chemistry.

[44]  Peixiang Cai,et al.  Predicting protein structural class with pseudo-amino acid composition and support vector machine fusion network. , 2006, Analytical biochemistry.

[45]  J C Reed,et al.  Postmitochondrial regulation of apoptosis during heart failure. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[46]  K. Chou,et al.  Signal-3L: A 3-layer approach for predicting signal peptides. , 2007, Biochemical and biophysical research communications.

[47]  C. Tanford Contribution of Hydrophobic Interactions to the Stability of the Globular Conformation of Proteins , 1962 .

[48]  Z. Huang,et al.  Using pseudo amino acid composition to predict protein subcellular location: Approached with Lyapunov index, Bessel function, and Chebyshev filter , 2005, Amino Acids.

[49]  Yanda Li,et al.  Prediction of protein submitochondria locations by hybridizing pseudo-amino acid composition with various physicochemical features of segmented sequence , 2006, BMC Bioinformatics.

[50]  L. Kier,et al.  Amino acid side chain parameters for correlation studies in biology and pharmacology. , 2009, International journal of peptide and protein research.

[51]  Kuo-Chen Chou,et al.  Large-scale predictions of gram-negative bacterial protein subcellular locations. , 2006, Journal of proteome research.

[52]  K. Chou,et al.  Gpos-PLoc: an ensemble classifier for predicting subcellular localization of Gram-positive bacterial proteins. , 2007, Protein engineering, design & selection : PEDS.

[53]  K. Chou,et al.  Support vector machines for predicting membrane protein types by using functional domain composition. , 2003, Biophysical journal.

[54]  K. Chou,et al.  Prediction of membrane protein types and subcellular locations , 1999, Proteins.

[55]  Shiow-Fen Hwang,et al.  ProLoc: Prediction of protein subnuclear localization using SVM with automatic selection from physicochemical composition features , 2007, Biosyst..

[56]  K. Chou,et al.  Hum-PLoc: a novel ensemble classifier for predicting human protein subcellular localization. , 2006, Biochemical and biophysical research communications.

[57]  Z. Huang,et al.  Using cellular automata images and pseudo amino acid composition to predict protein subcellular location , 2005, Amino Acids.

[58]  Hiroyuki Ogata,et al.  AAindex: Amino Acid Index Database , 1999, Nucleic Acids Res..

[59]  Kuo-Chen Chou,et al.  Using pseudo amino acid composition to predict protein structural classes: Approached with complexity measure factor , 2006, J. Comput. Chem..

[60]  Kuo-Chen Chou,et al.  Predicting protein localization in budding Yeast , 2005, Bioinform..

[61]  K. Chou,et al.  Prediction of protein structural classes. , 1995, Critical reviews in biochemistry and molecular biology.

[62]  Gerhard Wagner,et al.  Solution Structure of the RAIDD CARD and Model for CARD/CARD Interaction in Caspase-2 and Caspase-9 Recruitment , 1998, Cell.

[63]  Kuo-Chen Chou,et al.  Fuzzy KNN for predicting membrane protein types from pseudo-amino acid composition. , 2006, Journal of theoretical biology.

[64]  Xiaoyong Zou,et al.  Using pseudo-amino acid composition and support vector machine to predict protein structural class. , 2006, Journal of theoretical biology.

[65]  Ying-Li Chen,et al.  Prediction of the subcellular location of apoptosis proteins. , 2007, Journal of theoretical biology.

[66]  Z. Huang,et al.  Using complexity measure factor to predict protein subcellular location , 2005, Amino Acids.

[67]  Meng Wang,et al.  Using Fourier Spectrum Analysis and Pseudo Amino Acid Composition for Prediction of Membrane Protein Types , 2005, The protein journal.

[68]  K. Chou,et al.  Recent progress in protein subcellular location prediction. , 2007, Analytical biochemistry.

[69]  Kuo-Chen Chou,et al.  Nearest neighbour algorithm for predicting protein subcellular location by combining functional domain composition and pseudo-amino acid composition. , 2003, Biochemical and biophysical research communications.

[70]  H.-B. Shen,et al.  Using ensemble classifier to identify membrane protein types , 2006, Amino Acids.

[71]  Guo-Ping Zhou,et al.  Subcellular location prediction of apoptosis proteins , 2002, Proteins.

[72]  G. Evan,et al.  A matter of life and cell death. , 1998, Science.

[73]  Kuo-Chen Chou,et al.  Predicting eukaryotic protein subcellular location by fusing optimized evidence-theoretic K-Nearest Neighbor classifiers. , 2006, Journal of proteome research.

[74]  K. Chou,et al.  Prediction of the tertiary structure and substrate binding site of caspase‐8 , 1997, FEBS letters.

[75]  Ying-Li Chen,et al.  Prediction of apoptosis protein subcellular location using improved hybrid approach and pseudo-amino acid composition. , 2007, Journal of theoretical biology.

[76]  S M Pincus,et al.  Approximate entropy as a measure of system complexity. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[77]  K. Chou,et al.  EzyPred: a top-down approach for predicting enzyme functional classes and subclasses. , 2007, Biochemical and biophysical research communications.

[78]  Jing Huang,et al.  Support Vector Machines for Predicting Apoptosis Proteins Types , 2005, Acta biotheoretica.

[79]  S. Cory,et al.  The Bcl-2 protein family: arbiters of cell survival. , 1998, Science.

[80]  K. R. Woods,et al.  Prediction of protein antigenic determinants from amino acid sequences. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[81]  P. Argos,et al.  Structural prediction of membrane-bound proteins. , 2005, European journal of biochemistry.