Identify submitochondria and subchloroplast locations with pseudo amino acid composition: approach from the strategy of discrete wavelet transform feature extraction.

It is very challenging and complicated to predict protein locations at the sub-subcellular level. The key to enhancing the prediction quality for protein sub-subcellular locations is to grasp the core features of a protein that can discriminate among proteins with different subcompartment locations. In this study, a different formulation of pseudoamino acid composition by the approach of discrete wavelet transform feature extraction was developed to predict submitochondria and subchloroplast locations. As a result of jackknife cross-validation, with our method, it can efficiently distinguish mitochondrial proteins from chloroplast proteins with total accuracy of 98.8% and obtained a promising total accuracy of 93.38% for predicting submitochondria locations. Especially the predictive accuracy for mitochondrial outer membrane and chloroplast thylakoid lumen were 82.93% and 82.22%, respectively, showing an improvement of 4.88% and 27.22% when other existing methods were compared. The results indicated that the proposed method might be employed as a useful assistant technique for identifying sub-subcellular locations. We have implemented our algorithm as an online service called SubIdent (http://bioinfo.ncu.edu.cn/services.aspx).

[1]  Kuo-Chen Chou,et al.  Support vector machines for the classification and prediction of β‐turn types , 2002, Journal of peptide science : an official publication of the European Peptide Society.

[2]  Yang Dai,et al.  An SVM-based system for predicting protein subnuclear localizations , 2005, BMC Bioinformatics.

[3]  Jian-Ding Qiu,et al.  Prediction of the Types of Membrane Proteins Based on Discrete Wavelet Transform and Support Vector Machines , 2010, The protein journal.

[4]  P K Ponnuswamy,et al.  Identification of membrane spanning beta strands in bacterial porins. , 1997, Protein engineering.

[5]  Xiaoyong Zou,et al.  Prediction of protein secondary structure content by using the concept of Chou's pseudo amino acid composition and support vector machine. , 2009, Protein and peptide letters.

[6]  Zhang Qiang,et al.  Maximum Spectrum of Continuous Wavelet Transform and Its Application in Resolving an Overlapped Signal , 2004, J. Chem. Inf. Model..

[7]  K. Chou A novel approach to predicting protein structural classes in a (20–1)‐D amino acid composition space , 1995, Proteins.

[8]  Shiow-Fen Hwang,et al.  ProLoc: Prediction of protein subnuclear localization using SVM with automatic selection from physicochemical composition features , 2007, Biosyst..

[9]  K. Chou,et al.  Support vector machines for predicting the specificity of GalNAc-transferase , 2002, Peptides.

[10]  K. Chou,et al.  Using Functional Domain Composition and Support Vector Machines for Prediction of Protein Subcellular Location* , 2002, The Journal of Biological Chemistry.

[11]  Kuo-Chen Chou,et al.  Prediction of G-protein-coupled receptor classes. , 2005, Journal of proteome research.

[12]  Kuo-Chen Chou,et al.  A New Method for Predicting the Subcellular Localization of Eukaryotic Proteins with Both Single and Multiple Sites: Euk-mPLoc 2.0 , 2010, PloS one.

[13]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[14]  Y. Ueno,et al.  Prediction of spalling on a ball bearing by applying the discrete wavelet transform to vibration signals , 1996 .

[15]  Yongsheng Ding,et al.  Using Chou's pseudo amino acid composition to predict subcellular localization of apoptosis proteins: An approach with immune genetic algorithm-based ensemble classifier , 2008, Pattern Recognit. Lett..

[16]  M. Shlesinger,et al.  Transformational Homologies in Amino Acid Sequences Suggest Memberships in Protein Families , 1998 .

[17]  Jian-Ding Qiu,et al.  Using support vector machines for prediction of protein structural classes based on discrete wavelet transform , 2009, J. Comput. Chem..

[18]  DingYong-Sheng,et al.  Using Chou's pseudo amino acid composition to predict subcellular localization of apoptosis proteins , 2008 .

[19]  K. Chou,et al.  Low-frequency vibrations of DNA molecules. , 1984, The Biochemical journal.

[20]  Ying-Li Chen,et al.  Prediction of the subcellular location of apoptosis proteins. , 2007, Journal of theoretical biology.

[21]  Lukasz A. Kurgan,et al.  Prediction of structural classes for protein sequences and domains - Impact of prediction algorithms, sequence representation and homology, and test procedures on accuracy , 2006, Pattern Recognit..

[22]  Weilong Hao,et al.  OrgConv: detection of gene conversion using consensus sequences and its application in plant mitochondrial and chloroplast homologs , 2010, BMC Bioinformatics.

[23]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[24]  Kuo-Chen Chou,et al.  Support vector machines for prediction of protein signal sequences and their cleavage sites , 2003, Peptides.

[25]  V. D. Seleznev,et al.  Analytical model of ion transport and conversion of light energy in chloroplasts. , 2010, Journal of theoretical biology.

[26]  K. Chou,et al.  Recent progress in protein subcellular location prediction. , 2007, Analytical biochemistry.

[27]  R. Grantham Amino Acid Difference Formula to Help Explain Protein Evolution , 1974, Science.

[28]  F.-M. Li,et al.  Using pseudo amino acid composition to predict protein subnuclear location with improved hybrid approach , 2007, Amino Acids.

[29]  Bolin Shi,et al.  Study of wavelet denoising in apple's charge-coupled device near-infrared spectroscopy. , 2007, Journal of agricultural and food chemistry.

[30]  Yanda Li,et al.  SubChlo: predicting protein subchloroplast locations with pseudo-amino acid composition and the evidence-theoretic K-nearest neighbor (ET-KNN) algorithm. , 2009, Journal of theoretical biology.

[31]  Zhanchao Li,et al.  Using Chou's amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes. , 2007, Journal of theoretical biology.

[32]  Kuo-Chen Chou,et al.  Signal-CF: a subsite-coupled and window-fusing approach for predicting signal peptides. , 2007, Biochemical and biophysical research communications.

[33]  G. Schulz The structure of bacterial outer membrane proteins. , 2002, Biochimica et biophysica acta.

[34]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[35]  J. Garin,et al.  Proteomics of the Chloroplast Envelope Membranes from Arabidopsis thaliana*S , 2003, Molecular & Cellular Proteomics.

[36]  Kuo-Chen Chou,et al.  GPCR‐CA: A cellular automaton image approach for predicting G‐protein–coupled receptor functional classes , 2009, J. Comput. Chem..

[37]  Jian-Ding Qiu,et al.  Predicting subcellular location of apoptosis proteins based on wavelet transform and support vector machine , 2010, Amino Acids.

[38]  Hao Lin The modified Mahalanobis Discriminant for predicting outer membrane proteins by using Chou's pseudo amino acid composition. , 2008, Journal of theoretical biology.

[39]  Jianding Qiu,et al.  Using the concept of Chou's pseudo amino acid composition to predict enzyme family classes: an approach with support vector machine based on discrete wavelet transform. , 2010, Protein and peptide letters.

[40]  Q Gu,et al.  Prediction of G-protein-coupled receptor classes in low homology using Chou's pseudo amino acid composition with approximate entropy and hydrophobicity patterns. , 2010, Protein and peptide letters.

[41]  Stavros J. Hamodrakas,et al.  A Hidden Markov Model method, capable of predicting and discriminating β-barrel outer membrane proteins , 2004, BMC Bioinformatics.

[42]  K. Chou,et al.  Prediction of protein structural classes. , 1995, Critical reviews in biochemistry and molecular biology.

[43]  Hao Lin,et al.  Prediction of cell wall lytic enzymes using Chou's amphiphilic pseudo amino acid composition. , 2009, Protein and peptide letters.

[44]  K. Chou,et al.  REVIEW : Recent advances in developing web-servers for predicting protein attributes , 2009 .

[45]  K. Chou,et al.  Euk-mPLoc: a fusion classifier for large-scale eukaryotic protein subcellular location prediction by incorporating multiple sites. , 2007, Journal of proteome research.

[46]  I. Daubechies Ten Lectures on Wavelets , 1992 .

[47]  A. Duchêne,et al.  Defining the determinants for dual targeting of amino acyl-tRNA synthetases to mitochondria and chloroplasts. , 2009, Journal of molecular biology.

[48]  M. Beal,et al.  Mitochondrial dysfunction and oxidative stress in neurodegenerative diseases , 2006, Nature.

[49]  Stéphane Mallat,et al.  A Theory for Multiresolution Signal Decomposition: The Wavelet Representation , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[50]  Guo-Ping Zhou,et al.  Subcellular location prediction of apoptosis proteins , 2002, Proteins.

[51]  Guangya Zhang,et al.  Predicting the cofactors of oxidoreductases based on amino acid composition distribution and Chou's amphiphilic pseudo-amino acid composition. , 2008, Journal of theoretical biology.

[52]  Xiaoying Jiang,et al.  Using the concept of Chou's pseudo amino acid composition to predict apoptosis proteins subcellular location: an approach by approximate entropy. , 2008, Protein and peptide letters.

[53]  Kuo-Chen Chou,et al.  Prediction of Protein Structural Classes by Support Vector Machines , 2002, Comput. Chem..

[54]  Kuo-Chen Chou,et al.  Predicting protein subnuclear location with optimized evidence-theoretic K-nearest classifier and pseudo amino acid composition. , 2005, Biochemical and biophysical research communications.

[55]  M. Esmaeili,et al.  Using the concept of Chou's pseudo amino acid composition for risk type prediction of human papillomaviruses. , 2010, Journal of theoretical biology.

[56]  Jaques Reifman,et al.  Support vector machines with selective kernel scaling for protein classification and identification of key amino acid positions , 2002, Bioinform..

[57]  Chris H. Q. Ding,et al.  Multi-class protein fold recognition using support vector machines and neural networks , 2001, Bioinform..

[58]  Yi Yang,et al.  Analyzing functional similarity of protein sequences with discrete wavelet transform , 2005, Comput. Biol. Chem..

[59]  Guo-Ping Zhou,et al.  An Intriguing Controversy over Protein Structural Class Prediction , 1998, Journal of protein chemistry.

[60]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001 .

[61]  G P Zhou,et al.  Some insights into protein structural class prediction , 2001, Proteins.

[62]  B. Lowell,et al.  Mitochondrial Dysfunction and Type 2 Diabetes , 2005, Science.

[63]  N. Malmurugan,et al.  Neural classification of lung sounds using wavelet coefficients , 2004, Comput. Biol. Medicine.

[64]  Fengmin Li,et al.  Predicting protein subcellular location using Chou's pseudo amino acid composition and improved hybrid approach. , 2008, Protein and peptide letters.

[65]  Menglong Li,et al.  Fast Fourier transform-based support vector machine for subcellular localization prediction using different substitution models. , 2007, Acta biochimica et biophysica Sinica.

[66]  Hassan Mohabatkar,et al.  Prediction of cyclin proteins using Chou's pseudo amino acid composition. , 2010, Protein and peptide letters.

[67]  K. Chou,et al.  Prediction of linear B-cell epitopes using amino acid pair antigenicity scale , 2007, Amino Acids.

[68]  Kuo-Chen Chou,et al.  Support Vector Machine for predicting α-turn types , 2003, Peptides.

[69]  I. Tetko,et al.  MitoP2: An Integrative Tool for the Analysis of the Mitochondrial Proteome , 2008, Molecular biotechnology.

[70]  K. Chou Pseudo Amino Acid Composition and its Applications in Bioinformatics, Proteomics and System Biology , 2009 .

[71]  Shao-Ping Shi,et al.  Using the concept of Chou's pseudo amino acid composition to predict enzyme family classes: an approach with support vector machine based on discrete wavelet transform. , 2010, Protein and peptide letters.

[72]  Jianding Qiu,et al.  Using support vector machines to distinguish enzymes: approached by incorporating wavelet transform. , 2009, Journal of theoretical biology.

[73]  Cheng Wu,et al.  Feature Extraction Techniques for Protein Subcellular Localization Prediction , 2009 .

[74]  Kuo-Chen Chou,et al.  Support vector machines for predicting HIV protease cleavage sites in protein , 2002, J. Comput. Chem..

[75]  Yanda Li,et al.  Prediction of protein submitochondria locations by hybridizing pseudo-amino acid composition with various physicochemical features of segmented sequence , 2006, BMC Bioinformatics.

[76]  B. Rost,et al.  Adaptation of protein surfaces to subcellular location. , 1998, Journal of molecular biology.

[77]  K. Chou,et al.  Cell-PLoc 2.0: an improved package of web-servers for predicting subcellular localization of proteins in various organisms , 2010 .

[78]  Kelong Wang,et al.  Prediction of Mitochondrial Proteins Using Discrete Wavelet Transform , 2006, The protein journal.

[79]  Kuo-Chen Chou,et al.  Identify catalytic triads of serine hydrolases by support vector machines. , 2004, Journal of theoretical biology.

[80]  Loris Nanni,et al.  Genetic programming for creating Chou’s pseudo amino acid based features for submitochondria localization , 2008, Amino Acids.

[81]  M. Michael Gromiha,et al.  A simple method for predicting transmembrane α helices with better accuracy , 1999 .

[82]  K. Sjölander,et al.  The Arabidopsis thaliana Chloroplast Proteome Reveals Pathway Abundance and Novel Protein Functions , 2004, Current Biology.

[83]  Yanzhi Guo,et al.  Using the augmented Chou's pseudo amino acid composition for predicting protein submitochondria locations based on auto covariance approach. , 2009, Journal of theoretical biology.

[84]  K. Chou,et al.  Low-frequency collective motion in biomacromolecules and its biological functions. , 1988, Biophysical chemistry.

[85]  K. Chou,et al.  Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms , 2008, Nature Protocols.

[86]  K. Chou,et al.  Low-frequency motions in protein molecules. Beta-sheet and beta-barrel. , 1985, Biophysical journal.

[87]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[88]  Kuo-Chen Chou Low-frequency vibrations ofDNA molecules , 1984 .

[89]  Gajendra P. S. Raghava,et al.  COPid: Composition Based Protein Identification , 2008, Silico Biol..

[90]  Kuo-Chen Chou,et al.  Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes , 2005, Bioinform..