Universal patterns of purifying selection at noncoding positions in bacteria.

To investigate the dependence of the number of regulatory sites per intergenic region on genome size, we developed a new method for detecting purifying selection at noncoding positions in clades of related bacterial genomes. We comprehensively quantified evidence of purifying selection at noncoding positions across bacteria and found several striking universal patterns. Consistent with selection acting at transcriptional regulatory elements near the transcription start, we find a universal positional profile of selection with respect to gene starts and ends, showing most evidence of selection immediately upstream and least immediately downstream from genes. A further set of universal features indicates that selection for translation initiation efficiency is the major determinant of the sequence composition around translation start in all clades. In addition to a peak in selection at ribosomal binding sites, the region immediately around translation start shows a universal pattern of high adenine frequency, significant selection at silent positions, and avoidance of RNA secondary structure. Surprisingly, although the number of transcription factors (TF) increases quadratically with genome size, we present several lines of evidence that small and large genomes have the same average number of regulatory sites per intergenic region. By comparing the sequence diversity of the most and least conserved DNA words in intergenic regions across clades we provide evidence that the structure of transcription regulatory networks changes dramatically with genome size: Small genomes have a small number of TFs with a large number of target sites, whereas large genomes have a large number of TFs with a small number of target sites each.

[1]  D. P. Wall,et al.  Detecting putative orthologs , 2003, Bioinform..

[2]  G. Mitchison The regional rule for bacterial base composition. , 2005, Trends in genetics : TIG.

[3]  Alan M. Moses,et al.  MONKEY: identifying conserved transcription-factor binding sites in multiple alignments using a binding site-specific evolutionary model , 2004, Genome Biology.

[4]  Alison K. Hottes,et al.  Codon usage between genomes is constrained by genome-wide mutational processes. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[5]  J. L. Cherry Genome size and operon content. , 2003, Journal of theoretical biology.

[6]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[7]  T Gojobori,et al.  Codon and base biases after the initiation codon of the open reading frames in the Escherichia coli genome and their influence on the translation efficiency. , 2001, Journal of biochemistry.

[8]  A. Sandelin,et al.  Constrained binding site diversity within families of transcription factors enhances pattern discovery bioinformatics. , 2004, Journal of molecular biology.

[9]  M. Bulmer,et al.  Reduced synonymous substitution rate at the start of enterobacterial genes. , 1993, Nucleic acids research.

[10]  P. Sharp,et al.  The codon Adaptation Index--a measure of directional synonymous codon usage bias, and its potential applications. , 1987, Nucleic acids research.

[11]  Z. Yang,et al.  A space-time process model for the evolution of DNA sequences. , 1995, Genetics.

[12]  Martin Vingron,et al.  TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing , 2002, Bioinform..

[13]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[14]  S. Lory,et al.  Complete genome sequence of Pseudomonas aeruginosa PAO1, an opportunistic pathogen , 2000, Nature.

[15]  W. Tate,et al.  Codon bias at the 3'-side of the initiation codon is correlated with translation initiation efficiency in Escherichia coli. , 2001, Gene.

[16]  Katherine H. Huang,et al.  A novel method for accurate operon predictions in all sequenced prokaryotes , 2005, Nucleic acids research.

[17]  Adam Eyre-Walker,et al.  Patterns of evolutionary constraints in intronic and intergenic DNA of Drosophila. , 2004, Genome research.

[18]  N. Moran,et al.  Regulation of Transcription in a Reduced Bacterial Genome: Nutrient-Provisioning Genes of the Obligate Symbiont Buchnera aphidicola , 2005, Journal of bacteriology.

[19]  D. Haussler,et al.  Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. , 2005, Genome research.

[20]  C. T. Brown,et al.  Evolutionary comparisons suggest many novel cAMP response protein binding sites in Escherichia coli. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[21]  M. Watzele,et al.  Analyzing and enhancing mRNA translational efficiency in an Escherichia coli in vitro expression system. , 2004, Biochemical and biophysical research communications.

[22]  Rolf Wagner,et al.  Transcription Regulation in Prokaryotes , 2000 .

[23]  R. Nielsen Molecular signatures of natural selection. , 2005, Annual review of genetics.

[24]  Saurabh Sinha,et al.  A probabilistic method to detect regulatory modules , 2003, ISMB.

[25]  L. Cavalli-Sforza,et al.  PHYLOGENETIC ANALYSIS: MODELS AND ESTIMATION PROCEDURES , 1967, Evolution; international journal of organic evolution.

[26]  Erik van Nimwegen,et al.  PhyloGibbs: A Gibbs Sampling Motif Finder That Incorporates Phylogeny , 2005, PLoS Comput. Biol..

[27]  Ziheng Yang,et al.  PAML: a program package for phylogenetic analysis by maximum likelihood , 1997, Comput. Appl. Biosci..

[28]  Adam Eyre-Walker,et al.  The genomic rate of adaptive evolution. , 2006, Trends in ecology & evolution.

[29]  Nikolaus Rajewsky,et al.  The evolution of DNA regulatory regions for proteo-gamma bacteria by interspecies comparisons. , 2002, Genome research.

[30]  Julio Collado-Vides,et al.  RegulonDB (version 5.0): Escherichia coli K-12 transcriptional regulatory network, operon organization, and growth conditions , 2005, Nucleic Acids Res..

[31]  J. Parkhill,et al.  Comparative genomic structure of prokaryotes. , 2004, Annual review of genetics.

[32]  Joan V. Robinson,et al.  A Simple Model , 1969 .

[33]  Walter Fontana,et al.  Fast folding and comparison of RNA secondary structures , 1994 .

[34]  M. Lässig,et al.  Evolutionary population genetics of promoters: predicting binding sites and functional phylogenies. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[35]  A Danchin,et al.  Translation in Bacillus subtilis: roles and trends of initiation and termination, insights from a genome analysis. , 1999, Nucleic acids research.

[36]  Howard Ochman,et al.  Neutral mutations and neutral substitutions in bacterial genomes. , 2003, Molecular biology and evolution.

[37]  Brian Golding,et al.  A maximum likelihood approach to the detection of selection from a phylogeny , 1990, Journal of Molecular Evolution.

[38]  J. Felsenstein,et al.  A Hidden Markov Model approach to variation among sites in rate of evolution. , 1996, Molecular biology and evolution.

[39]  P. V. von Hippel,et al.  Selection of DNA binding sites by regulatory proteins. Statistical-mechanical theory and application to operators and promoters. , 1987, Journal of molecular biology.

[40]  J. Shine,et al.  The 3'-terminal sequence of Escherichia coli 16S ribosomal RNA: complementarity to nonsense triplets and ribosome binding sites. , 1974, Proceedings of the National Academy of Sciences of the United States of America.

[41]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[42]  A. Bhagwat,et al.  Transcription-induced mutations: increase in C to T mutations in the nontranscribed strand during transcription in Escherichia coli. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[43]  Stephen J Freeland,et al.  A simple model based on mutation and selection explains trends in codon and amino-acid usage and GC composition within and across genomes , 2001, Genome Biology.

[44]  Sarah A. Teichmann,et al.  DBD: a transcription factor prediction database , 2005, Nucleic Acids Res..

[45]  A. Halpern,et al.  Evolutionary distances for protein-coding sequences: modeling site-specific residue frequencies. , 1998, Molecular biology and evolution.

[46]  Terence Hwa,et al.  Transcriptional regulation by the numbers: models. , 2005, Current opinion in genetics & development.

[47]  A. N. Spiridonov,et al.  Congruent evolution of different classes of non-coding DNA in prokaryotic genomes. , 2002, Nucleic acids research.

[48]  D. Halligan,et al.  Ubiquitous selective constraints in the Drosophila genome revealed by a genome-wide interspecies comparison. , 2006, Genome research.

[49]  Mathieu Blanchette,et al.  PhyME: A probabilistic algorithm for finding motifs in sets of orthologous sequences , 2004, BMC Bioinformatics.

[50]  Eduardo P C Rocha,et al.  Base composition bias might result from competition for metabolic resources. , 2002, Trends in genetics : TIG.

[51]  E. Nimwegen Scaling Laws in the Functional Content of Genomes , 2003, physics/0307001.