Prediction of subcellular location apoptosis proteins with ensemble classifier and feature selection

Apoptosis proteins have a central role in the development and the homeostasis of an organism. These proteins are very important for understanding the mechanism of programmed cell death. The function of an apoptosis protein is closely related to its subcellular location. It is crucial to develop powerful tools to predict apoptosis protein locations for rapidly increasing gap between the number of known structural proteins and the number of known sequences in protein databank. In this study, amino acids pair compositions with different spaces are used to construct feature sets for representing sample of protein feature selection approach based on binary particle swarm optimization, which is applied to extract effective feature. Ensemble classifier is used as prediction engine, of which the basic classifier is the fuzzy K-nearest neighbor. Each basic classifier is trained with different feature sets. Two datasets often used in prior works are selected to validate the performance of proposed approach. The results obtained by jackknife test are quite encouraging, indicating that the proposed method might become a potentially useful tool for subcellular location of apoptosis protein, or at least can play a complimentary role to the existing methods in the relevant areas. The supplement information and software written in Matlab are available by contacting the corresponding author.

[1]  S.-W. Zhang,et al.  Prediction of protein subcellular localization by support vector machines using multi-scale energy and pseudo amino acid composition , 2007, Amino Acids.

[2]  Kuo-Chen Chou,et al.  Signal-CF: a subsite-coupled and window-fusing approach for predicting signal peptides. , 2007, Biochemical and biophysical research communications.

[3]  Meng Wang,et al.  Using Fourier Spectrum Analysis and Pseudo Amino Acid Composition for Prediction of Membrane Protein Types , 2005, The protein journal.

[4]  Z. Huang,et al.  Using cellular automata images and pseudo amino acid composition to predict protein subcellular location , 2005, Amino Acids.

[5]  G M Maggiora,et al.  Energetics of the structure of the four-alpha-helix bundle in proteins. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[6]  S. Orrenius,et al.  Apoptosis: a basic biological phenomenon with wide‐ranging implications in human disease , 2005, Journal of internal medicine.

[7]  Kuo-Chen Chou,et al.  Molecular modeling of two CYP2C19 SNPs and its implications for personalized drug design. , 2008, Protein and peptide letters.

[8]  Kuo-Chen Chou,et al.  Prediction protein structural classes with pseudo-amino acid composition: approximate entropy and hydrophobicity pattern. , 2008, Journal of theoretical biology.

[9]  K. Chou,et al.  Recent progress in protein subcellular location prediction. , 2007, Analytical biochemistry.

[10]  K. Chou,et al.  Role of the protein outside active site on the diffusion-controlled reaction of enzymes , 1982 .

[11]  K. Chou,et al.  Using Functional Domain Composition and Support Vector Machines for Prediction of Protein Subcellular Location* , 2002, The Journal of Biological Chemistry.

[12]  Yu-Dong Cai,et al.  Support Vector Machines for predicting protein structural class , 2001, BMC Bioinformatics.

[13]  Kuo-Chen Chou,et al.  Predicting protein subcellular location by fusing multiple classifiers , 2006, Journal of cellular biochemistry.

[14]  Minoru Kanehisa,et al.  AAindex: Amino Acid index database , 2000, Nucleic Acids Res..

[15]  K. Chou,et al.  A vectorized sequence-coupling model for predicting HIV protease cleavage sites in proteins. , 1993, The Journal of biological chemistry.

[16]  Kuo-Chen Chou,et al.  Prediction of G-protein-coupled receptor classes. , 2005, Journal of proteome research.

[17]  Kuo-Chen Chou,et al.  Coupling interaction between thromboxane A2 receptor and alpha-13 subunit of guanine nucleotide-binding protein. , 2005, Journal of proteome research.

[18]  Chun Yan,et al.  Prediction of protein subcellular location using a combined feature of sequence , 2005, FEBS letters.

[19]  Kuo-Chen Chou,et al.  HP-Lattice QSAR for dynein proteins: experimental proteomics (2D-electrophoresis, mass spectrometry) and theoretic study of a Leishmania infantum sequence. , 2008, Bioorganic & medicinal chemistry.

[20]  Guo-Ping Zhou,et al.  An Intriguing Controversy over Protein Structural Class Prediction , 1998, Journal of protein chemistry.

[21]  Zhen-Hui Zhang,et al.  A novel method for apoptosis protein subcellular localization prediction combining encoding based on grouped weight and support vector machine , 2006, FEBS letters.

[22]  G P Zhou,et al.  Some insights into protein structural class prediction , 2001, Proteins.

[23]  K. Chou,et al.  Prediction of the tertiary structure of a caspase‐9/inhibitor complex , 2000, FEBS letters.

[24]  Kuo-Chen Chou,et al.  Modelling extracellular domains of GABA-A receptors: subtypes 1, 2, 3, and 5. , 2004, Biochemical and biophysical research communications.

[25]  J C Reed,et al.  Postmitochondrial regulation of apoptosis during heart failure. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[26]  K. Chou,et al.  Hum-mPLoc: an ensemble classifier for large-scale human protein subcellular location prediction by incorporating samples with multiple sites. , 2007, Biochemical and biophysical research communications.

[27]  Russell C. Eberhart,et al.  A discrete binary version of the particle swarm algorithm , 1997, 1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation.

[28]  Kuo-Chen Chou,et al.  Screening for new agonists against Alzheimer's disease. , 2007, Medicinal chemistry (Shariqah (United Arab Emirates)).

[29]  H. Steller Mechanisms and genes of cellular suicide , 1995, Science.

[30]  Jing Huang,et al.  Support Vector Machines for Predicting Apoptosis Proteins Types , 2005, Acta biotheoretica.

[31]  Kuo-Chen Chou,et al.  Fuzzy KNN for predicting membrane protein types from pseudo-amino acid composition. , 2006, Journal of theoretical biology.

[32]  K Nishikawa,et al.  Discrimination of intracellular and extracellular proteins using amino acid composition and residue-pair frequencies. , 1994, Journal of molecular biology.

[33]  Kuo-Chen Chou,et al.  Using pseudo amino acid composition to predict protein structural classes: Approached with complexity measure factor , 2006, J. Comput. Chem..

[34]  Zhi-Ping Feng,et al.  An overview on predicting the subcellular location of a protein , 2002, Silico Biol..

[35]  C. Kuo-chen,et al.  Studies on the rate of diffusion-controlled reactions of enzymes. Spatial factor and force field factor. , 1974, Scientia Sinica.

[36]  Hao Lin,et al.  Predicting conotoxin superfamily and family by using pseudo amino acid composition and modified Mahalanobis discriminant. , 2007, Biochemical and biophysical research communications.

[37]  Ying-Li Chen,et al.  Prediction of the subcellular location of apoptosis proteins. , 2007, Journal of theoretical biology.

[38]  Scott Dick,et al.  Classifier ensembles for protein structural class prediction with varying homology. , 2006, Biochemical and biophysical research communications.

[39]  Lourdes Santana,et al.  Proteomics, networks and connectivity indices , 2008, Proteomics.

[40]  Kuo-Chen Chou,et al.  Molecular therapeutic target for type-2 diabetes. , 2004, Journal of proteome research.

[41]  Roland Eils,et al.  Predicting protein subcellular locations using hierarchical ensemble of Bayesian classifiers based on Markov chains , 2006, BMC Bioinformatics.

[42]  S M Pincus,et al.  Approximate entropy as a measure of system complexity. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[43]  Z. Huang,et al.  Using pseudo amino acid composition to predict protein subcellular location: Approached with Lyapunov index, Bessel function, and Chebyshev filter , 2005, Amino Acids.

[44]  L. Kier,et al.  Amino acid side chain parameters for correlation studies in biology and pharmacology. , 2009, International journal of peptide and protein research.

[45]  Ke Chen,et al.  Prediction of flexible/rigid regions from protein sequences using k-spaced amino acid pairs , 2007, BMC Structural Biology.

[46]  M. Levitt,et al.  Conformation of amino acid side-chains in proteins. , 1978, Journal of molecular biology.

[47]  Kuo-Chen Chou,et al.  Binding mechanism of coronavirus main proteinase with ligands and its implication to drug design against SARS , 2003, Biochemical and Biophysical Research Communications.

[48]  J. Richman,et al.  Physiological time-series analysis using approximate entropy and sample entropy. , 2000, American journal of physiology. Heart and circulatory physiology.

[49]  Kuo-Chen Chou,et al.  Ensemble classifier for protein fold pattern recognition , 2006, Bioinform..

[50]  K. Chou,et al.  Hum-PLoc: a novel ensemble classifier for predicting human protein subcellular localization. , 2006, Biochemical and biophysical research communications.

[51]  Kuo-Chen Chou,et al.  Multiple field three dimensional quantitative structure–activity relationship (MF‐3D‐QSAR) , 2008, J. Comput. Chem..

[52]  K. Chou,et al.  Prediction of protein subcellular locations by GO-FunD-PseAA predictor. , 2004, Biochemical and biophysical research communications.

[53]  Kuo-Chen Chou,et al.  HIVcleave: a web-server for predicting human immunodeficiency virus protease cleavage sites in proteins. , 2008, Analytical biochemistry.

[54]  James Kennedy,et al.  Particle swarm optimization , 2002, Proceedings of ICNN'95 - International Conference on Neural Networks.

[55]  K. Chou,et al.  Signal-3L: A 3-layer approach for predicting signal peptides. , 2007, Biochemical and biophysical research communications.

[56]  C. Tanford Contribution of Hydrophobic Interactions to the Stability of the Globular Conformation of Proteins , 1962 .

[57]  Kuo-Chen Chou,et al.  Predicting eukaryotic protein subcellular location by fusing optimized evidence-theoretic K-Nearest Neighbor classifiers. , 2006, Journal of proteome research.

[58]  Kuo-Chen Chou,et al.  Molecular modeling studies of peptide drug candidates against SARS. , 2006, Medicinal chemistry (Shariqah (United Arab Emirates)).

[59]  A. Wyllie,et al.  Apoptosis: A Basic Biological Phenomenon with Wide-ranging Implications in Tissue Kinetics , 1972, British Journal of Cancer.

[60]  S. Brunak,et al.  Locating proteins in the cell using TargetP, SignalP and related tools , 2007, Nature Protocols.

[61]  M. Peter,et al.  Advances in apoptosis research. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[62]  I. Cosic Macromolecular bioactivity: is it resonant interaction between macromolecules?-theory and applications , 1994, IEEE Transactions on Biomedical Engineering.

[63]  K. Chou,et al.  Prediction of protein structural classes. , 1995, Critical reviews in biochemistry and molecular biology.

[64]  K. Chou Structural bioinformatics and its impact to biomedical science. , 2004, Current medicinal chemistry.

[65]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[66]  Kuo-Chen Chou,et al.  Methodology development for predicting subcellular localization and other attributes of proteins , 2007, Expert review of proteomics.

[67]  K. Chou,et al.  ProtIdent: a web server for identifying proteases and their types by fusing functional domain and sequential evolution information. , 2008, Biochemical and biophysical research communications.

[68]  Peixiang Cai,et al.  Predicting protein structural class with pseudo-amino acid composition and support vector machine fusion network. , 2006, Analytical biochemistry.

[69]  J. M. Zimmerman,et al.  The characterization of amino acid sequences in proteins by statistical methods. , 1968, Journal of theoretical biology.

[70]  Xiaoyong Zou,et al.  Using pseudo-amino acid composition and support vector machine to predict protein structural class. , 2006, Journal of theoretical biology.

[71]  Yanda Li,et al.  Prediction of protein submitochondria locations by hybridizing pseudo-amino acid composition with various physicochemical features of segmented sequence , 2006, BMC Bioinformatics.

[72]  K. R. Woods,et al.  Prediction of protein antigenic determinants from amino acid sequences. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[73]  K. Chou,et al.  EzyPred: a top-down approach for predicting enzyme functional classes and subclasses. , 2007, Biochemical and biophysical research communications.

[74]  J. Janin,et al.  Surface and inside volumes in globular proteins , 1979, Nature.

[75]  Humberto González-Díaz,et al.  3D-QSAR study for DNA cleavage proteins with a potential anti-tumor ATCUN-like motif. , 2006, Journal of inorganic biochemistry.

[76]  K. Chou,et al.  Energy-optimized structure of antifreeze protein and its binding mechanism. , 1992, Journal of molecular biology.

[77]  Kuo-Chen Chou,et al.  Heuristic molecular lipophilicity potential (HMLP): A 2D‐QSAR study to LADH of molecular family pyrazole and derivatives , 2005, J. Comput. Chem..

[78]  Kuo-Chen Chou,et al.  Predicting protein localization in budding Yeast , 2005, Bioinform..

[79]  Kuo-Chen Chou,et al.  Computational approach to drug design for oxazolidinones as antibacterial agents. , 2007, Medicinal chemistry (Shariqah (United Arab Emirates)).

[80]  J B Schulz,et al.  Caspases as treatment targets in stroke and neurodegenerative diseases , 1999, Annals of neurology.

[81]  K. Chou,et al.  Low-frequency collective motion in biomacromolecules and its biological functions. , 1988, Biophysical chemistry.

[82]  K. Chou,et al.  Progress in computational approach to drug development against SARS. , 2006, Current medicinal chemistry.

[83]  K. Chou,et al.  Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms , 2008, Nature Protocols.

[84]  Qianzhong Li,et al.  Using pseudo amino acid composition to predict protein structural class: Approached by incorporating 400 dipeptide components , 2007, J. Comput. Chem..

[85]  K. Chou,et al.  Recent advances in QSAR and their applications in predicting the activities of chemical molecules, peptides and proteins for drug design. , 2008, Current protein & peptide science.

[86]  Kuo-Chen Chou,et al.  Energetic approach to the packing of α-helices. II: General treatment of nonequivalent and nonregular helices , 1984 .

[87]  K. Chou Prediction of human immunodeficiency virus protease cleavage sites in proteins. , 1996, Analytical biochemistry.

[88]  H.-B. Shen,et al.  Using ensemble classifier to identify membrane protein types , 2006, Amino Acids.

[89]  K. Chou,et al.  Support vector machines for predicting membrane protein types by using functional domain composition. , 2003, Biophysical journal.

[90]  Guo-Ping Zhou,et al.  Subcellular location prediction of apoptosis proteins , 2002, Proteins.

[91]  X.-B. Zhou,et al.  Improved prediction of subcellular location for apoptosis proteins by the dual-layer support vector machine , 2008, Amino Acids.

[92]  Yongsheng Ding,et al.  Prediction of protein subcellular location using hydrophobic patterns of amino acid sequence , 2006, Comput. Biol. Chem..

[93]  K C Chou,et al.  Prediction of protein structural classes and subcellular locations. , 2000, Current protein & peptide science.

[94]  Yu-Dong Cai,et al.  Predicting protease types by hybridizing gene ontology and pseudo amino acid composition , 2006, Proteins.

[95]  K. Chou,et al.  Prediction of membrane protein types and subcellular locations , 1999, Proteins.

[96]  Kuo-Chen Chou,et al.  MemType-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM. , 2007, Biochemical and biophysical research communications.

[97]  Kuo-Chen Chou,et al.  Insights from modelling the 3D structure of the extracellular domain of alpha7 nicotinic acetylcholine receptor. , 2004, Biochemical and biophysical research communications.

[98]  K. Chou,et al.  Euk-mPLoc: a fusion classifier for large-scale eukaryotic protein subcellular location prediction by incorporating multiple sites. , 2007, Journal of proteome research.

[99]  Kuo-Chen Chou,et al.  Nearest neighbour algorithm for predicting protein subcellular location by combining functional domain composition and pseudo-amino acid composition. , 2003, Biochemical and biophysical research communications.

[100]  M. Wang,et al.  Low-frequency Fourier spectrum for predicting membrane protein types. , 2005, Biochemical and biophysical research communications.

[101]  Kuo-Chen Chou,et al.  Agaritine and its derivatives are potential inhibitors against HIV proteases. , 2007, Medicinal chemistry (Shariqah (United Arab Emirates)).

[102]  G M Maggiora,et al.  Disposition of amphiphilic helices in heteropolar environments , 1997, Proteins.

[103]  Lukasz Kurgan,et al.  Prediction of protein crystallization using collocation of amino acid pairs. , 2007, Biochemical and biophysical research communications.

[104]  H.-B. Shen,et al.  Euk-PLoc: an ensemble classifier for large-scale eukaryotic protein subcellular location prediction , 2007, Amino Acids.

[105]  K. Chou,et al.  Bioinformatical analysis of G-protein-coupled receptors. , 2002, Journal of proteome research.

[106]  P. Argos,et al.  Structural prediction of membrane-bound proteins. , 2005, European journal of biochemistry.

[107]  Kuo-Chen Chou,et al.  Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes , 2005, Bioinform..

[108]  Minoru Kanehisa,et al.  Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs , 2003, Bioinform..

[109]  S. Cory,et al.  The Bcl-2 protein family: arbiters of cell survival. , 1998, Science.

[110]  Junying Yuan,et al.  Solution Structure of BID, an Intracellular Amplifier of Apoptotic Signaling , 1999, Cell.

[111]  Kuo-Chen Chou,et al.  Virtual Screening for SARS-CoV Protease Based on KZ7088 Pharmacophore Points , 2004, J. Chem. Inf. Model..

[112]  James M. Keller,et al.  A fuzzy K-nearest neighbor algorithm , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[113]  Yixue Li,et al.  Prediction of membrane protein types in a hybrid space. , 2008, Journal of proteome research.

[114]  Ying Huang,et al.  Prediction of protein subcellular locations using fuzzy k-NN method , 2004, Bioinform..

[115]  K. Chou,et al.  Unified QSAR approach to antimicrobials. Part 3: first multi-tasking QSAR model for input-coded prediction, structural back-projection, and complex networks clustering of antiprotozoal compounds. , 2008, Bioorganic & medicinal chemistry.

[116]  Shiow-Fen Hwang,et al.  Accurate prediction of enzyme subfamily class using an adaptive fuzzy k-nearest neighbor method , 2007, Biosyst..

[117]  Z. Huang,et al.  Using complexity measure factor to predict protein subcellular location , 2005, Amino Acids.

[118]  K. Chou A novel approach to predicting protein structural classes in a (20–1)‐D amino acid composition space , 1995, Proteins.

[119]  Ying-Li Chen,et al.  Prediction of apoptosis protein subcellular location using improved hybrid approach and pseudo-amino acid composition. , 2007, Journal of theoretical biology.

[120]  Amos Bairoch,et al.  The PROSITE database, its status in 2002 , 2002, Nucleic Acids Res..

[121]  Kuo-Chen Chou,et al.  Using supervised fuzzy clustering to predict protein structural classes. , 2005, Biochemical and biophysical research communications.

[122]  Q. Pan,et al.  Using pseudo amino acid composition to predict protein subcellular location: approached with amino acid composition distribution , 2008, Amino Acids.

[123]  G. Evan,et al.  A matter of life and cell death. , 1998, Science.

[124]  Q Gu,et al.  Prediction of G-protein-coupled receptor classes in low homology using Chou's pseudo amino acid composition with approximate entropy and hydrophobicity patterns. , 2010, Protein and peptide letters.