Prediction of Cell Wall Sorting Signals in Gram-Positive bacteria with a Hidden Markov Model: Application to Complete genomes

Surface proteins in Gram-positive bacteria are frequently implicated in virulence. We have focused on a group of extracellular cell wall-attached proteins (CWPs), containing an LPXTG motif for cleavage and covalent coupling to peptidoglycan by sortase enzymes. A hidden Markov model (HMM) approach for predicting the LPXTG-anchored cell wall proteins of Gram-positive bacteria was developed and compared against existing methods. The HMM model is parsimonious in terms of the number of freely estimated parameters, and it has proved to be very sensitive and specific in a training set of 55 experimentally verified LPXTG-anchored cell wall proteins as well as in reliable data sets of globular and transmembrane proteins. In order to identify such proteins in Gram-positive bacteria, a comprehensive analysis of 94 completely sequenced genomes has been performed. We identified, in total, 860 LPXTG-anchored cell wall proteins, a number that is significantly higher compared to those obtained by other available methods. Of these proteins, 237 are hypothetical proteins according to the annotation of SwissProt, and 88 had no homologs in the SwissProt database--this might be evidence that they are members of newly identified families of CWPs. The prediction tool, the database with the proteins identified in the genomes, and supplementary material are available online at http://bioinformatics.biol.uoa.gr/CW-PRED/.

[1]  S. Salzberg,et al.  Complete Genome Sequence of a Virulent Isolate of Streptococcus pneumoniae , 2001, Science.

[2]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[3]  Jos Boekhorst,et al.  Genome-Wide Detection and Analysis of Cell Wall-Bound Proteins with LPxTG-Like Sorting Motifs , 2005, Journal of bacteriology.

[4]  Magnus Rasmussen,et al.  Improved Pattern for Genome-Based Screening Identifies Novel Cell Wall-Attached Proteins in Gram-Positive Bacteria , 2001, Infection and Immunity.

[5]  Robert D. Finn,et al.  Pfam: clans, web tools and services , 2005, Nucleic Acids Res..

[6]  S H White,et al.  MPtopo: A database of membrane protein topology , 2001, Protein science : a publication of the Protein Society.

[7]  Rolf Apweiler,et al.  A collection of well characterised integral membrane proteins , 2000, Bioinform..

[8]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[9]  William Wiley Navarre,et al.  Surface Proteins of Gram-Positive Bacteria and Mechanisms of Their Targeting to the Cell Wall Envelope , 1999, Microbiology and Molecular Biology Reviews.

[10]  Durbin,et al.  Biological Sequence Analysis , 1998 .

[11]  Livia Visai,et al.  Characterization of novel LPXTG-containing proteins of Staphylococcus aureus identified from genome sequences. , 2003, Microbiology.

[12]  V. Fischetti,et al.  Conservation of a hexapeptide sequence in the anchor region of surface proteins from Gram‐positive cocci , 1990, Molecular microbiology.

[13]  Stavros J. Hamodrakas,et al.  Algorithms for incorporating prior topological information in HMMs: application to transmembrane proteins , 2006, BMC Bioinformatics.

[14]  Dieter Jahn,et al.  PrediSi: prediction of signal peptides and their cleavage positions , 2004, Nucleic Acids Res..

[15]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[16]  Piero Fariselli,et al.  A new decoding algorithm for hidden Markov models improves the prediction of the topology of all-beta membrane proteins , 2005, BMC Bioinformatics.

[17]  Masami Ikeda,et al.  TMPDB: a database of experimentally-characterized transmembrane topologies , 2003, Nucleic Acids Res..

[18]  S Brunak,et al.  On the total number of genes and their length distribution in complete microbial genomes. , 2001, Trends in genetics : TIG.

[19]  A. Krogh,et al.  Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. , 2001, Journal of molecular biology.

[20]  T. Foster,et al.  Surface protein adhesins of Staphylococcus aureus. , 1998, Trends in microbiology.

[21]  L. Marraffini,et al.  Sortases and the Art of Anchoring Proteins to the Envelopes of Gram-Positive Bacteria , 2006, Microbiology and Molecular Biology Reviews.

[22]  S. Brunak,et al.  Improved prediction of signal peptides: SignalP 3.0. , 2004, Journal of molecular biology.

[23]  S. Mazmanian,et al.  Anchoring of Surface Proteins to the Cell Wall of Staphylococcus aureus , 2002, The Journal of Biological Chemistry.

[24]  A. Krogh,et al.  Prediction of lipoprotein signal peptides in Gram‐negative bacteria , 2003, Protein science : a publication of the Protein Society.

[25]  L. Marraffini,et al.  Protein sorting to the cell wall envelope of Gram-positive bacteria. , 2004, Biochimica et biophysica acta.

[26]  István Simon,et al.  The HMMTOP transmembrane topology prediction server , 2001, Bioinform..

[27]  R. Clubb,et al.  A Comparative Genome Analysis Identifies Distinct Sorting Pathways in Gram-Positive Bacteria , 2004, Infection and Immunity.

[28]  V. Fischetti,et al.  Characterization of a Unique Glycosylated Anchor Endopeptidase That Cleaves the LPXTG Sequence Motif of Cell Surface Proteins of Gram-positive Bacteria* , 2002, The Journal of Biological Chemistry.

[29]  B. Bensing,et al.  An accessory sec locus of Streptococcus gordonii is required for export of the surface protein GspB and for normal levels of binding to human platelets , 2002, Molecular microbiology.

[30]  P. Model,et al.  Cell wall sorting signals in surface proteins of gram‐positive bacteria. , 1993, The EMBO journal.

[31]  Burkhard Rost,et al.  Long membrane helices and short loops predicted less accurately , 2002, Protein science : a publication of the Protein Society.

[32]  Tamotsu Noguchi,et al.  PDB-REPRDB: a database of representative protein chains from the Protein Data Bank (PDB) in 2003 , 2003, Nucleic Acids Res..

[33]  Rolf Apweiler,et al.  A comparison of signal sequence prediction methods using a test set of signal peptides , 2000, Bioinform..

[34]  Vincent A. Fischetti,et al.  Sorting of protein a to the staphylococcal cell wall , 1992, Cell.

[35]  U. Hobohm,et al.  Selection of representative protein data sets , 1992, Protein science : a publication of the Protein Society.

[36]  Pascale Cossart,et al.  Surface proteins and the pathogenic potential of Listeria monocytogenes. , 2002, Trends in microbiology.

[37]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[38]  S. Mazmanian,et al.  Sortase‐catalysed anchoring of surface proteins to the cell wall of Staphylococcus aureus , 2001, Molecular microbiology.