SecretP: identifying bacterial secreted proteins by fusing new features into Chou's pseudo-amino acid composition.

Protein secretion plays an important role in bacterial lifestyles. Secreted proteins are crucial for bacterial pathogenesis by making bacteria interact with their environments, particularly delivering pathogenic and symbiotic bacteria into their eukaryotic hosts. Therefore, identification of bacterial secreted proteins becomes an important process for the study of various diseases and the corresponding drugs. In this paper, fusing several new features into Chou's pseudo-amino acid composition (PseAAC), two support vector machine (SVM)-based ternary classifiers are developed to predict secreted proteins of Gram-negative and Gram-positive bacteria. For the two types of bacteria, the high accuracy of 94.03% and 94.36% are obtained in distinguishing classically secreted, non-classically secreted and non-secreted proteins by our method. In order to compare the practical ability of our method in identifying bacterial secreted proteins with those of six published methods, proteins in Escherichia coli and Bacillus subtilis are collected to construct the test sets of Gram-negative and Gram-positive bacteria, and the prediction results of our method are comparable to those of existing methods. When performed on two public independent data sets for predicting NCSPs, it also yields satisfactory results for Gram-negative bacterial proteins. The prediction server SecretP can be accessed at http://cic.scu.edu.cn/bioinformatics/secretPV2/index.htm.

[1]  J. M. Zimmerman,et al.  The characterization of amino acid sequences in proteins by statistical methods. , 1968, Journal of theoretical biology.

[2]  Tracy Palmer,et al.  Secretion by numbers: protein traffic in prokaryotes , 2006, Molecular microbiology.

[3]  A. Krogh,et al.  Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. , 2001, Journal of molecular biology.

[4]  K. Nakai,et al.  PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization. , 1999, Trends in biochemical sciences.

[5]  Kuo-Chen Chou,et al.  Support vector machines for predicting HIV protease cleavage sites in protein , 2002, J. Comput. Chem..

[6]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[7]  M. Hensel,et al.  Protein secretion systems and adhesins: the molecular armory of Gram-negative pathogens. , 2007, International journal of medical microbiology : IJMM.

[8]  Ron D. Appel,et al.  ExPASy: the proteomics server for in-depth protein knowledge and analysis , 2003, Nucleic Acids Res..

[9]  Kuo-Chen Chou,et al.  Prediction of Protein Structural Classes by Support Vector Machines , 2002, Comput. Chem..

[10]  A. Krogh,et al.  A combined transmembrane topology and signal peptide prediction method. , 2004, Journal of molecular biology.

[11]  K. Chou,et al.  Prediction of protein structural classes. , 1995, Critical reviews in biochemistry and molecular biology.

[12]  K. R. Woods,et al.  Prediction of protein antigenic determinants from amino acid sequences. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[13]  K. Chou,et al.  Support vector machines for predicting membrane protein types by using functional domain composition. , 2003, Biophysical journal.

[14]  Guo-Ping Zhou,et al.  Subcellular location prediction of apoptosis proteins , 2002, Proteins.

[15]  Kuo-Chen Chou,et al.  Using grey dynamic modeling and pseudo amino acid composition to predict protein structural classes , 2008, J. Comput. Chem..

[16]  S. Wold,et al.  DNA and peptide sequences and chemical processes multivariately modelled by principal component analysis and partial least-squares projections to latent structures , 1993 .

[17]  X. Xiao,et al.  Application of protein grey incidence degree measure to predict protein quaternary structural types , 2009, Amino Acids.

[18]  K. Chou,et al.  Analysis and Prediction of the Metabolic Stability of Proteins Based on Their Sequential Features, Subcellular Locations and Interaction Networks , 2010, PloS one.

[19]  I. Holland Translocation of bacterial proteins--an overview. , 2004, Biochimica et biophysica acta.

[20]  B. Rost,et al.  Mimicking cellular sorting improves prediction of subcellular localization. , 2005, Journal of molecular biology.

[21]  M. Konkel,et al.  Bacterial secreted proteins are required for the internalization of Campylobacter jejuni into cultured mammalian cells , 1999, Molecular microbiology.

[22]  K. Chou,et al.  Support vector machines for predicting the specificity of GalNAc-transferase , 2002, Peptides.

[23]  Zhirong Sun,et al.  Support vector machine approach for protein subcellular localization prediction , 2001, Bioinform..

[24]  L. Shapiro,et al.  Bacterial protein secretion--a target for new antibiotics? , 1997, Chemistry & biology.

[25]  Peng-Fei Zhang,et al.  Proteomics-based identification of secreted protein dihydrodiol dehydrogenase as a novel serum markers of non-small cell lung cancer. , 2006, Lung cancer.

[26]  Z. Huang,et al.  Using complexity measure factor to predict protein subcellular location , 2005, Amino Acids.

[27]  Søren Brunak,et al.  Non-classical protein secretion in bacteria , 2005, BMC Microbiology.

[28]  R. Grantham Amino Acid Difference Formula to Help Explain Protein Evolution , 1974, Science.

[29]  Kuo-Chen Chou,et al.  Using pseudo amino acid composition to predict protein structural classes: Approached with complexity measure factor , 2006, J. Comput. Chem..

[30]  Zhanchao Li,et al.  Using Chou's amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes. , 2007, Journal of theoretical biology.

[31]  Kuo-Chen Chou,et al.  Signal-CF: a subsite-coupled and window-fusing approach for predicting signal peptides. , 2007, Biochemical and biophysical research communications.

[32]  Michel Hébraud,et al.  The protein secretion systems in Listeria: inside out bacterial virulence. , 2006, FEMS microbiology reviews.

[33]  M. Esmaeili,et al.  Using the concept of Chou's pseudo amino acid composition for risk type prediction of human papillomaviruses. , 2010, Journal of theoretical biology.

[34]  N. Blom,et al.  Feature-based prediction of non-classical and leaderless protein secretion. , 2004, Protein engineering, design & selection : PEDS.

[35]  K. Chou,et al.  Signal-3L: A 3-layer approach for predicting signal peptides. , 2007, Biochemical and biophysical research communications.

[36]  H Nielsen,et al.  Machine learning approaches for the prediction of signal peptides and other protein sorting signals. , 1999, Protein engineering.

[37]  P. Zikmanis,et al.  Distinctive attributes for predicted secondary structures at terminal sequences of non-classically secreted proteins from proteobacteria , 2008, Central European Journal of Biology.

[38]  M. Kanehisa,et al.  Expert system for predicting protein localization sites in gram‐negative bacteria , 1991, Proteins.

[39]  Kuo-Chen Chou,et al.  Support vector machines for the classification and prediction of β‐turn types , 2002, Journal of peptide science : an official publication of the European Peptide Society.

[40]  K. Chou,et al.  Predicting the quaternary structure attribute of a protein by hybridizing functional domain composition and pseudo amino acid composition , 2009 .

[41]  K. Chou,et al.  Using Functional Domain Composition and Support Vector Machines for Prediction of Protein Subcellular Location* , 2002, The Journal of Biological Chemistry.

[42]  S. Bron,et al.  Signal Peptide-Dependent Protein Transport inBacillus subtilis: a Genome-Based Survey of the Secretome , 2000, Microbiology and Molecular Biology Reviews.

[43]  C. Tanford Contribution of Hydrophobic Interactions to the Stability of the Globular Conformation of Proteins , 1962 .

[44]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[45]  K. Chou,et al.  Application of SVM to predict membrane protein types. , 2004, Journal of theoretical biology.

[46]  Kuo-Chen Chou,et al.  A New Method for Predicting the Subcellular Localization of Eukaryotic Proteins with Both Single and Multiple Sites: Euk-mPLoc 2.0 , 2010, PloS one.

[47]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[48]  Hao Lin The modified Mahalanobis Discriminant for predicting outer membrane proteins by using Chou's pseudo amino acid composition. , 2008, Journal of theoretical biology.

[49]  Sébastien Guiral,et al.  Competence-programmed predation of noncompetent cells in the human pathogen Streptococcus pneumoniae: genetic requirements. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[50]  K. Hughes,et al.  Type III secretion: a secretory pathway serving both motility and virulence (Review) , 2005, Molecular membrane biology.

[51]  Jaime Prilusky,et al.  FoldIndex copyright: a simple tool to predict whether a given protein sequence is intrinsically unfolded , 2005, Bioinform..

[52]  S. Lory Secretion of proteins and assembly of bacterial surface organelles: shared pathways of extracellular protein targeting. , 1998, Current opinion in microbiology.

[53]  Dmitrij Frishman,et al.  Protein solubility: sequence based prediction and experimental verification , 2007, Bioinform..

[54]  Kuo-Chen Chou,et al.  Identify catalytic triads of serine hydrolases by support vector machines. , 2004, Journal of theoretical biology.

[55]  T. Noll,et al.  Systematic screening of all signal peptides from Bacillus subtilis: a powerful strategy in optimizing heterologous protein secretion in Gram-positive bacteria. , 2006, Journal of molecular biology.

[56]  Søren Brunak,et al.  A Neural Network Method for Identification of Prokaryotic and Eukaryotic Signal Peptides and Prediction of their Cleavage Sites , 1997, Int. J. Neural Syst..

[57]  Erik L. L. Sonnhammer,et al.  Advantages of combined transmembrane topology and signal peptide prediction—the Phobius web server , 2007, Nucleic Acids Res..

[58]  Kuo-Chen Chou,et al.  Prediction of protein structure classes with pseudo amino acid composition and fuzzy support vector machine network. , 2007, Protein and peptide letters.

[59]  Zhiyong Lu,et al.  Predicting subcellular localization of proteins using machine-learned classifiers , 2004, Bioinform..

[60]  S. Brunak,et al.  Improved prediction of signal peptides: SignalP 3.0. , 2004, Journal of molecular biology.

[61]  K. Chou Pseudo Amino Acid Composition and its Applications in Bioinformatics, Proteomics and System Biology , 2009 .

[62]  João C Setubal,et al.  Protein secretion systems in bacterial-host associations, and their description in the Gene Ontology , 2009, BMC Microbiology.

[63]  Yanzhi Guo,et al.  Using the augmented Chou's pseudo amino acid composition for predicting protein submitochondria locations based on auto covariance approach. , 2009, Journal of theoretical biology.

[64]  Jenn-Kang Hwang,et al.  Predicting subcellular localization of proteins for Gram‐negative bacteria by support vector machines based on n‐peptide compositions , 2004, Protein science : a publication of the Protein Society.

[65]  Jenn-Kang Hwang,et al.  Prediction of protein subcellular localization , 2006, Proteins.

[66]  K. Chou,et al.  Recent progress in protein subcellular location prediction. , 2007, Analytical biochemistry.

[67]  Z. Huang,et al.  Using cellular automata images and pseudo amino acid composition to predict protein subcellular location , 2005, Amino Acids.

[68]  E. D. Cambronne,et al.  Recognition and Delivery of Effector Proteins into Eukaryotic Cells by Bacterial Secretion Systems , 2006, Traffic.

[69]  K. Yamane,et al.  Proteome analysis of Bacillus subtilis extracellular proteins: a two-dimensional protein electrophoretic study. , 2000, Microbiology.

[70]  U. Bonas,et al.  Common infection strategies of plant and animal pathogenic bacteria. , 2003, Current opinion in plant biology.

[71]  Erik L. L. Sonnhammer,et al.  An HMM posterior decoder for sequence feature prediction that includes homology information , 2005, ISMB.

[72]  L. Mashburn-Warren,et al.  Special delivery: vesicle trafficking in prokaryotes , 2006, Molecular microbiology.

[73]  K. Chou,et al.  Predicting protein structural classes with pseudo amino acid composition: an approach using geometric moments of cellular automaton image. , 2008, Journal of theoretical biology.

[74]  H. Bull,et al.  Surface tension of amino acid solutions: a hydrophobicity scale of the amino acid residues. , 1974, Archives of biochemistry and biophysics.

[75]  Kuo-Chen Chou,et al.  Support Vector Machine for predicting α-turn types , 2003, Peptides.

[76]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[77]  Martin Ester,et al.  Sequence analysis PSORTb v . 2 . 0 : Expanded prediction of bacterial protein subcellular localization and insights gained from comparative proteome analysis , 2004 .

[78]  D. Thanassi,et al.  Mechanisms of Protein Export across the Bacterial Outer Membrane , 2005, Journal of bacteriology.

[79]  Michel Hébraud,et al.  Secretion and subcellular localizations of bacterial proteins: a semantic awareness issue. , 2009, Trends in microbiology.

[80]  Kuo-Chen Chou,et al.  Support vector machines for prediction of protein signal sequences and their cleavage sites , 2003, Peptides.

[81]  Yanzhi Guo,et al.  Using support vector machine combined with auto covariance to predict protein–protein interactions from protein sequences , 2008, Nucleic acids research.

[82]  Ke Wang,et al.  PSORT-B: improving protein subcellular localization prediction for Gram-negative bacteria , 2003, Nucleic Acids Res..

[83]  Menglong Li,et al.  SecretP: A new method for predicting mammalian secreted proteins , 2010, Peptides.

[84]  M. Saier,et al.  Type II protein secretion and its relationship to bacterial type IV pili and archaeal flagella. , 2003, Microbiology.

[85]  K. Chou,et al.  Predicting Drug-Target Interaction Networks Based on Functional Groups and Biological Features , 2010, PloS one.

[86]  Kuo-Chen Chou,et al.  GPCR‐CA: A cellular automaton image approach for predicting G‐protein–coupled receptor functional classes , 2009, J. Comput. Chem..

[87]  Kuo-Chen Chou,et al.  GPCR-GIA: a web-server for identifying G-protein coupled receptors and their families with grey incidence analysis. , 2009, Protein engineering, design & selection : PEDS.

[88]  S. Beatson,et al.  Protein secretion systems in Fusobacterium nucleatum: genomic identification of Type 4 piliation and complete Type V pathways brings new insight into mechanisms of pathogenesis. , 2005, Biochimica et biophysica acta.

[89]  P Argos,et al.  Prediction of secondary structural content of proteins from their amino acid composition alone. I. New analytic vector decomposition methods , 1996, Proteins.