A genome-wide survey of short coding sequences in streptococci.

Identification of short genes that encode peptides of fewer than 60 aa is challenging, both experimentally and in silico. As a consequence, the universe of these short coding sequences (CDSs) remains largely unknown, although some are acknowledged to play important roles in cell-cell communication, particularly in Gram-positive bacteria. This paper reports a thorough search for short CDSs across streptococcal genomes. Our bioinformatic approach relied on a combination of advanced intrinsic and extrinsic methods. In the first step, intrinsic sequence information (nucleotide composition and presence of RBSs) served to identify new short putative CDSs (spCDSs) and to eliminate the differences between annotation policies. In the second step, pseudogene fragments and false predictions were filtered out. The last step consisted of screening the remaining spCDSs for lines of extrinsic evidence involving sequence and gene-context comparisons. A total of 789 spCDSs across 20 complete genomes (19 Streptococcus and one Enterococcus) received the support of at least one line of extrinsic evidence, which corresponds to an average of 20 short CDSs per million base pairs. Most of these had no known function, and a significant fraction (31%) are not even annotated as hypothetical genes in GenBank records. As an illustration of the value of this list, we describe a new family of CDSs, encoding very short hydrophobic peptides (20-23 aa) situated just upstream of some of the positive transcriptional regulators of the Rgg family. The expression of seven other short CDSs from Streptococcus thermophilus CNRZ1066 that encode peptides ranging in length from 41 to 56 aa was confirmed by real-time quantitative RT-PCR and revealed a variety of expression patterns. Finally, one peptide from this list, encoded by a gene that is not annotated in GenBank, was identified in a cell-envelope-enriched fraction of S. thermophilus CNRZ1066.

[1]  Howard Ochman,et al.  Distinguishing the ORFs from the ELFs: short bacterial genes and the annotation of genomes. , 2002, Trends in genetics : TIG.

[2]  T. D. Read,et al.  Role of Mobile DNA in the Evolution of Vancomycin-Resistant Enterococcus faecalis , 2003, Science.

[3]  M. Borodovsky,et al.  GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. , 2001, Nucleic acids research.

[4]  G. Venemâ,et al.  A chloride‐inducible acid resistance mechanism in Lactococcus lactis and its regulation , 1998, Molecular microbiology.

[5]  Carmen Buchrieser,et al.  Genome sequence of Streptococcus agalactiae, a pathogen causing invasive neonatal disease , 2002, Molecular microbiology.

[6]  Jaideep P. Sundaram,et al.  Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial "pan-genome". , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Thomas D. Schmittgen,et al.  Analysis of relative gene expression data using real-time quantitative PCR and the 2(-Delta Delta C(T)) Method. , 2001, Methods.

[8]  A. Trubuil,et al.  Proteomic Signature of Lactococcus lactis NCDO763 Cultivated in Milk , 2005, Applied and Environmental Microbiology.

[9]  Oscar P Kuipers,et al.  Controlling competence in Bacillus subtilis: shared use of regulators. , 2003, Microbiology.

[10]  Michal J. Nagiec,et al.  Genome sequence of a serotype M28 strain of group a streptococcus: potential new insights into puerperal sepsis and bacterial disease specificity. , 2005, The Journal of infectious diseases.

[11]  Martin Ester,et al.  Sequence analysis PSORTb v . 2 . 0 : Expanded prediction of bacterial protein subcellular localization and insights gained from comparative proteome analysis , 2004 .

[12]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Meng-Yao Liu,et al.  Genome sequence of a serotype M3 strain of group A Streptococcus: Phage-encoded toxins, the high-virulence phenotype, and clone emergence , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Laetitia Fontaine,et al.  New insights in the molecular biology and physiology of Streptococcus thermophilus revealed by comparative genomics. , 2005, FEMS microbiology reviews.

[15]  J. Musser,et al.  Evolutionary origin and emergence of a highly successful clone of serotype M1 group a Streptococcus involved multiple horizontal gene transfer events. , 2005, The Journal of infectious diseases.

[16]  Anders Krogh,et al.  EasyGene – a prokaryotic gene finder that ranks ORFs by statistical significance , 2003, BMC Bioinformatics.

[17]  Masahira Hattori,et al.  Genome sequence of an M3 strain of Streptococcus pyogenes reveals a large-scale genomic rearrangement in invasive strains and new insights into phage evolution. , 2003, Genome research.

[18]  Todd M. Smith,et al.  Genome sequence and comparative microarray analysis of serotype M18 group A Streptococcus strains associated with acute rheumatic fever outbreaks , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Ping Chen,et al.  Functional Analyses of the Promoters in the Lantibiotic Mutacin II Biosynthetic Locus in Streptococcus mutans , 1999, Applied and Environmental Microbiology.

[20]  V. Juillard,et al.  Development of a minimal chemically‐defined medium for the exponential growth of Streptococcus thermophilus , 2001, Journal of applied microbiology.

[21]  Anders Krogh,et al.  Large-scale prokaryotic gene prediction and comparison to genome annotation , 2005, Bioinform..

[22]  J. Musser,et al.  Progress toward characterization of the group A Streptococcus metagenome: complete genome sequence of a macrolide-resistant serotype M6 strain. , 2004, The Journal of infectious diseases.

[23]  W R Pearson,et al.  Comparison of DNA sequences with protein sequences. , 1997, Genomics.

[24]  D. Lereclus,et al.  Specificity and Polymorphism of the PlcR-PapR Quorum-Sensing System in the Bacillus cereus Group , 2005, Journal of bacteriology.

[25]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[26]  M. Kleerebezem Quorum sensing control of lantibiotic production; nisin and subtilin autoregulate their own biosynthesis , 2004, Peptides.

[27]  Elliot J. Lefkowitz,et al.  Genome of the Bacterium Streptococcus pneumoniae Strain R6 , 2001, Journal of bacteriology.

[28]  S. Salzberg,et al.  Complete Genome Sequence of a Virulent Isolate of Streptococcus pneumoniae , 2001, Science.

[29]  I. Nes,et al.  LasX, a transcriptional regulator of the lactocin S biosynthetic genes in Lactobacillus sakei L45, acts both as an activator and a repressor. , 2002, Biochimie.

[30]  S Brunak,et al.  On the total number of genes and their length distribution in complete microbial genomes. , 2001, Trends in genetics : TIG.

[31]  P. Zuber A peptide profile of the Bacillus subtilis genome , 2001, Peptides.

[32]  M. Vickerman,et al.  Genetic Analysis of the rgg-gtfG Junctional Region and Its Role in Streptococcus gordonii Glucosyltransferase Activity , 2002, Infection and Immunity.

[33]  M. Caparon,et al.  A role for Trigger Factor and an Rgg‐like regulator in the transcription, secretion and processing of the cysteine proteinase of Streptococcus pyogenes , 1998, The EMBO journal.

[34]  Ian T. Paulsen,et al.  Complete genome sequence and comparative genomic analysis of an emerging human pathogen, serotype V Streptococcus agalactiae , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[35]  Mark Gerstein,et al.  A "polyORFomic" analysis of prokaryote genomes using disabled-homology filtering reveals conserved but undiscovered short ORFs. , 2003, Journal of molecular biology.

[36]  M. Borodovsky,et al.  Intrinsic and extrinsic approaches for detecting genes in a bacterial genome. , 1994, Nucleic acids research.

[37]  Runying Tian,et al.  Genome sequence of Streptococcus mutans UA159, a cariogenic dental pathogen , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[38]  M. Kimura Preponderance of synonymous changes as evidence for the neutral theory of molecular evolution , 1977, Nature.

[39]  Mark Hoebeke,et al.  Mining Bacillus subtilis chromosome heterogeneities using hidden Markov models. , 2002, Nucleic acids research.

[40]  Gholson J. Lyon,et al.  Peptide signaling in Staphylococcus aureus and other Gram-positive bacteria , 2004, Peptides.

[41]  K. Bryson,et al.  AGMIAL: implementing an annotation strategy for prokaryote genomes as a distributed system , 2006, Nucleic acids research.

[42]  S. Brunak,et al.  SHORT COMMUNICATION Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites , 1997 .

[43]  D. Ohlendorf,et al.  Molecular basis for control of conjugation by bacterial pheromone and inhibitor peptides , 2006, Molecular microbiology.

[44]  Martin Vingron,et al.  TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing , 2002, Bioinform..

[45]  G. Fichant,et al.  Independent evolution of competence regulatory cascades in streptococci? , 2006, Trends in microbiology.

[46]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[47]  W R Pearson,et al.  Flexible sequence similarity searching with the FASTA3 program package. , 2000, Methods in molecular biology.

[48]  Bruce A. Roe,et al.  Complete genome sequence of an M1 strain of Streptococcus pyogenes , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[49]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[50]  Michal J. Nagiec,et al.  Molecular genetic anatomy of inter- and intraserotype variation in the human bacterial pathogen group A Streptococcus. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[51]  A. Goffeau,et al.  Complete sequence and comparative genome analysis of the dairy bacterium Streptococcus thermophilus , 2004, Nature Biotechnology.

[52]  G. Dunny,et al.  Enterococcal peptide sex pheromones: synthesis and control of biological activity , 2004, Peptides.