Large-scale predictions of gram-negative bacterial protein subcellular locations.

Many species of Gram-negative bacteria are pathogenic bacteria that can cause disease in a host organism. This pathogenic capability is usually associated with certain components in Gram-negative cells. Therefore, developing an automated method for fast and reliable prediction of Gram-negative protein subcellular location will allow us to not only timely annotate gene products, but also screen candidates for drug discovery. However, protein subcellular location prediction is a very difficult problem, particularly when more location sites need to be involved and when unknown query proteins do not have significant homology to proteins of known subcellular locations. PSORT-B, a recently updated version of PSORT, widely used for predicting Gram-negative protein subcellular location, only covers five location sites. Also, the data set used to train PSORT-B contains many proteins with high degrees of sequence identity in a same location group and, hence, may bear a strong homology bias. To overcome these problems, a new predictor, called "Gneg-PLoc", is developed. Featured by fusing many basic classifiers each being trained with a stringent data set containing proteins with strictly less than 25% sequence identity to one another in a same location group, the new predictor can cover eight subcellular locations; that is, cytoplasm, extracellular space, fimbrium, flagellum, inner membrane, nucleoid, outer membrane, and periplasm. In comparison with PSORT-B, the new predictor not only covers more subcellular locations, but also yields remarkably higher success rates. Gneg-PLoc is available as a Web server at http://202.120.37.186/bioinf/Gneg. To support the demand of people working in the relevant areas, a downloadable file is provided at the same Web site to list the results identified by Gneg-PLoc for 49 907 Gram-negative protein entries in the Swiss-Prot database that have no subcellular location annotations or are annotated with uncertain terms. The large-scale results will be updated twice a year to cover the new entries of Gram-negative bacterial proteins and reflect the new development of Gneg-PLoc.

[1]  Lin He,et al.  Application of Pseudo Amino Acid Composition for Predicting Protein Subcellular Location: Stochastic Signal Processing Approach , 2003, Journal of protein chemistry.

[2]  P. Aloy,et al.  Relation between amino acid composition and cellular location of proteins. , 1997, Journal of molecular biology.

[3]  Zhi-Ping Feng,et al.  Prediction of protein structural class by amino acid and polypeptide composition. , 2002, European journal of biochemistry.

[4]  Gert Lubec,et al.  Searching for hypothetical proteins: Theory and practice based upon original data and literature , 2005, Progress in Neurobiology.

[5]  K C Chou,et al.  Protein folding classes: a geometric interpretation of the amino acid composition of globular proteins. , 1994, Protein engineering.

[6]  Thierry Denoeux,et al.  A k-nearest neighbor classification rule based on Dempster-Shafer theory , 1995, IEEE Trans. Syst. Man Cybern..

[7]  Meng Wang,et al.  SLLE for predicting membrane protein types. , 2005, Journal of theoretical biology.

[8]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[9]  Guo-Ping Zhou,et al.  An Intriguing Controversy over Protein Structural Class Prediction , 1998, Journal of protein chemistry.

[10]  X.-D. Sun,et al.  Prediction of protein structural classes using support vector machines , 2006, Amino Acids.

[11]  Sándor Pongor,et al.  The SBASE protein domain library, Release 4.0: a collection of annotated protein sequence segments , 1993, Nucleic Acids Res..

[12]  K. Nakai,et al.  PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization. , 1999, Trends in biochemical sciences.

[13]  G P Zhou,et al.  Some insights into protein structural class prediction , 2001, Proteins.

[14]  M. Kanehisa,et al.  Expert system for predicting protein localization sites in gram‐negative bacteria , 1991, Proteins.

[15]  Jiang Wang,et al.  Prediction of protein structural class with Rough Sets , 2006, BMC Bioinformatics.

[16]  Dongsup Kim,et al.  Prediction of protein secondary structure content using amino acid composition and evolutionary information , 2005, Proteins.

[17]  R. Jernigan,et al.  Understanding the recognition of protein structural classes by amino acid composition , 1997, Proteins.

[18]  K. Chou,et al.  Using Functional Domain Composition and Support Vector Machines for Prediction of Protein Subcellular Location* , 2002, The Journal of Biological Chemistry.

[19]  Kuo-Chen Chou,et al.  Using optimized evidence-theoretic K-nearest neighbor classifier and pseudo-amino acid composition to predict membrane protein types. , 2005, Biochemical and biophysical research communications.

[20]  K Nishikawa,et al.  Discrimination of intracellular and extracellular proteins using amino acid composition and residue-pair frequencies. , 1994, Journal of molecular biology.

[21]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL , 1997, Nucleic Acids Res..

[22]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[23]  Z. Feng,et al.  Prediction of the subcellular location of prokaryotic proteins based on a new representation of the amino acid composition. , 2001, Biopolymers.

[24]  C. Zhang,et al.  Predicting protein folding types by distance functions that make allowances for amino acid interactions. , 1994, The Journal of biological chemistry.

[25]  Yu-Dong Cai,et al.  Predicting protease types by hybridizing gene ontology and pseudo amino acid composition , 2006, Proteins.

[26]  Kuo-Chen Chou,et al.  Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes , 2005, Bioinform..

[27]  K. Chou,et al.  Prediction of protein structural classes. , 1995, Critical reviews in biochemistry and molecular biology.

[28]  Kuo-Chen Chou,et al.  Using pseudo amino acid composition to predict protein structural classes: Approached with complexity measure factor , 2006, J. Comput. Chem..

[29]  P. Klein,et al.  Prediction of protein structural class by discriminant analysis. , 1986, Biochimica et biophysica acta.

[30]  Kuo-Chen Chou,et al.  Prediction of Membrane Protein Types by Incorporating Amphipathic Effects , 2005, J. Chem. Inf. Model..

[31]  K Nishikawa,et al.  The folding type of a protein is relevant to the amino acid composition. , 1986, Journal of biochemistry.

[32]  Ke Wang,et al.  PSORT-B: improving protein subcellular localization prediction for Gram-negative bacteria , 2003, Nucleic Acids Res..

[33]  K. Chou Structural bioinformatics and its impact to biomedical science. , 2004, Current medicinal chemistry.

[34]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[35]  Y Cai,et al.  Prediction of protein structural classes by neural network. , 2000, Biochimie.

[36]  Zhi-Ping Feng,et al.  An overview on predicting the subcellular location of a protein , 2002, Silico Biol..

[37]  Alex Bateman,et al.  The InterPro database, an integrated documentation resource for protein families, domains and functional sites , 2001, Nucleic Acids Res..

[38]  J M Chandonia,et al.  Neural networks for secondary structure and structural class predictions , 1995, Protein science : a publication of the Protein Society.

[39]  P. Mahalanobis On the generalized distance in statistics , 1936 .

[40]  Guoli Wang,et al.  PISCES: a protein sequence culling server , 2003, Bioinform..

[41]  K. Chou A novel approach to predicting protein structural classes in a (20–1)‐D amino acid composition space , 1995, Proteins.

[42]  S.-W. Zhang,et al.  Prediction of protein homo-oligomer types by pseudo amino acid composition: Approached with an improved feature extraction and Naive Bayes Feature Fusion , 2006, Amino Acids.

[43]  K. Chou,et al.  Protein subcellular location prediction. , 1999, Protein engineering.

[44]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[45]  Kuo-Chen Chou,et al.  Predicting protein structural class by functional domain composition. , 2004, Biochemical and biophysical research communications.

[46]  C. DeLisi,et al.  Prediction of protein structural class from the amino acid sequence , 1986, Biopolymers.

[47]  G Deléage,et al.  An algorithm for protein secondary structure prediction based on class prediction. , 1987, Protein engineering.

[48]  K. Nakai Protein sorting signals and prediction of subcellular localization. , 2000, Advances in protein chemistry.