PSORTb 3.0: improved protein subcellular localization prediction with refined localization subcategories and predictive capabilities for all prokaryotes

Motivation: PSORTb has remained the most precise bacterial protein subcellular localization (SCL) predictor since it was first made available in 2003. However, the recall needs to be improved and no accurate SCL predictors yet make predictions for archaea, nor differentiate important localization subcategories, such as proteins targeted to a host cell or bacterial hyperstructures/organelles. Such improvements should preferably be encompassed in a freely available web-based predictor that can also be used as a standalone program. Results: We developed PSORTb version 3.0 with improved recall, higher proteome-scale prediction coverage, and new refined localization subcategories. It is the first SCL predictor specifically geared for all prokaryotes, including archaea and bacteria with atypical membrane/cell wall topologies. It features an improved standalone program, with a new batch results delivery system complementing its web interface. We evaluated the most accurate SCL predictors using 5-fold cross validation plus we performed an independent proteomics analysis, showing that PSORTb 3.0 is the most accurate but can benefit from being complemented by Proteome Analyst predictions. Availability: http://www.psort.org/psortb (download open source software or use the web interface). Contact: psort-mail@sfu.ca Supplementary Information: Supplementary data are availableat Bioinformatics online.

[1]  Natalia N. Ivanova,et al.  A phylogeny-driven genomic encyclopaedia of Bacteria and Archaea , 2009, Nature.

[2]  Burkhard Rost,et al.  Sequence conserved for subcellular localization , 2002, Protein science : a publication of the Protein Society.

[3]  Leonard J Foster,et al.  Quantitative Comparison of Caste Differences in Honeybee Hemolymph*S , 2006, Molecular & Cellular Proteomics.

[4]  Wen-Lian Hsu,et al.  Protein subcellular localization prediction based on compartment-specific features and structure conservation , 2007, BMC Bioinformatics.

[5]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[6]  Zhiyong Lu,et al.  Predicting subcellular localization of proteins using machine-learned classifiers , 2004, Bioinform..

[7]  Thomas Rattei,et al.  Sequence-Based Prediction of Type III Secreted Proteins , 2009, PLoS pathogens.

[8]  Trinad Chakraborty,et al.  Augur - a computational pipeline for whole genome microbial surface protein prediction and classification , 2006, Bioinform..

[9]  Ram Samudrala,et al.  Accurate Prediction of Secreted Substrates and Identification of a Conserved Putative Secretion Signal for Type III Secretion Systems , 2009, PLoS pathogens.

[10]  Jean-Philippe Vert,et al.  A novel representation of protein sequences for prediction of subcellular location using support vector machines , 2005, Protein science : a publication of the Protein Society.

[11]  Makoto Miyata,et al.  Cytoskeleton of Mollicutes , 2006, Journal of Molecular Microbiology and Biotechnology.

[12]  István Simon,et al.  The HMMTOP transmembrane topology prediction server , 2001, Bioinform..

[13]  Kaizhong Zhang,et al.  Combinatorial pattern discovery for scientific data: some preliminary results , 1994, SIGMOD '94.

[14]  Leonard J Foster,et al.  Changes in protein expression during honey bee larval development , 2008, Genome Biology.

[15]  Mamoon Rashid,et al.  Support Vector Machine-based method for predicting subcellular localization of mycobacterial proteins using evolutionary information and motifs , 2007, BMC Bioinformatics.

[16]  Martin Ester,et al.  Sequence analysis PSORTb v . 2 . 0 : Expanded prediction of bacterial protein subcellular localization and insights gained from comparative proteome analysis , 2004 .

[17]  Roland Eils,et al.  Predicting protein subcellular locations using hierarchical ensemble of Bayesian classifiers based on Markov chains , 2006, BMC Bioinformatics.

[18]  Jenn-Kang Hwang,et al.  Prediction of protein subcellular localization , 2006, Proteins.

[19]  Jos Boekhorst,et al.  LocateP: Genome-scale subcellular-location predictor for bacterial proteins , 2008, BMC Bioinformatics.

[20]  B G Thompson,et al.  Isolation and characterization of the plasma membrane and the outer membrane of Deinococcus radiodurans strain Sark. , 1981, Canadian journal of microbiology.

[21]  Kuo-Chen Chou,et al.  Large-scale predictions of gram-negative bacterial protein subcellular locations. , 2006, Journal of proteome research.

[22]  Christophe G. Lambert,et al.  PSORTdb: a protein subcellular localization database for bacteria , 2004, Nucleic Acids Res..

[23]  Jonathan A. Eisen,et al.  A phylogeny-driven genomic encyclopaedia of Bacteria and Archaea , 2009, Nature.

[24]  Martin Ester,et al.  Recognition of Multi-sentence n-ary Subcellular Localization Mentions in Biomedical Abstracts , 2007, LBM.

[25]  A. Elofsson,et al.  Best α‐helical transmembrane protein topology predictions are achieved using hidden Markov models and evolutionary information , 2004 .

[26]  K. Chou,et al.  Gpos-PLoc: an ensemble classifier for predicting subcellular localization of Gram-positive bacterial proteins. , 2007, Protein engineering, design & selection : PEDS.

[27]  Raymond Lo,et al.  Pseudomonas Genome Database: facilitating user-friendly, comprehensive comparisons of microbial genomes , 2008, Nucleic Acids Res..

[28]  S. Lory,et al.  Complete genome sequence of Pseudomonas aeruginosa PAO1, an opportunistic pathogen , 2000, Nature.

[29]  Maria Staiano,et al.  Proteins from extremophiles as stable tools for advanced biotechnological applications of high social interest , 2007, Journal of The Royal Society Interface.

[30]  Yossi Matias,et al.  Augmenting Suffix Trees, with Applications , 1998, ESA.

[31]  J. Gardy,et al.  Methods for predicting bacterial protein subcellular localization , 2006, Nature Reviews Microbiology.

[32]  Guo-Zheng Li,et al.  Using AdaBoost for the prediction of subcellular location of prokaryotic and eukaryotic proteins , 2008, Molecular Diversity.

[33]  Cathy H. Wu,et al.  The Universal Protein Resource (UniProt): an expanding universe of protein information , 2005, Nucleic Acids Res..

[34]  J. Gardy,et al.  Assessing the precision of high-throughput computational and laboratory approaches for the genome-wide identification of protein subcellular localization in bacteria , 2005, BMC Genomics.

[35]  Wen-Lian Hsu,et al.  PSLDoc: Protein subcellular localization prediction based on gapped‐dipeptides and probabilistic latent semantic analysis , 2008, Proteins.