Operon formation is driven by co-regulation and not by horizontal gene transfer.

The organization of bacterial genes into operons was originally ascribed to the benefits of co-regulation. More recently, the "selfish operon" model, in which operons are formed by repeated gain and loss of genes, was proposed. Indeed, operons are often subject to horizontal gene transfer (HGT). On the other hand, non-HGT genes are particularly likely to be in operons. To clarify whether HGT is involved in operon formation, we identified recently formed operons in Escherichia coli K12. We show that genes that have homologs in distantly related bacteria but not in close relatives of E. coli--indicating HGT--form new operons at about the same rates as native genes. Furthermore, genes in new operons are no more likely than other genes to have phylogenetic trees that are inconsistent with the species tree. In contrast, essential genes and ubiquitous genes without paralogs--genes believed to undergo HGT rarely--often form new operons. We conclude that HGT is not a cause of operon formation but instead promotes the prevalence of pre-existing operons. To explain operon formation, we propose that new operons reduce the amount of regulatory information required to specify optimal expression patterns and infer that operons should be more likely to evolve than independent promoters when regulation is complex. Consistent with this hypothesis, operons have greater amounts of conserved regulatory sequences than do individually transcribed genes.

[1]  Peter S Swain,et al.  Efficient attenuation of stochasticity in gene expression through post-transcriptional control. , 2004, Journal of molecular biology.

[2]  Martin Vingron,et al.  TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing , 2002, Bioinform..

[3]  Csaba Pál,et al.  Evidence against the selfish operon theory. , 2004, Trends in genetics : TIG.

[4]  Eugene V Koonin,et al.  Connected gene neighborhoods in prokaryotic genomes. , 2002, Nucleic acids research.

[5]  N. Moran,et al.  From Gene Trees to Organismal Phylogeny in Prokaryotes:The Case of the γ-Proteobacteria , 2003, PLoS biology.

[6]  G. W. Hatfield,et al.  Valyl-tRNA Synthetase Gene of Escherichia coli K 12 , 2001 .

[7]  Katherine H. Huang,et al.  A novel method for accurate operon predictions in all sequenced prokaryotes , 2005, Nucleic acids research.

[8]  E. Koonin,et al.  Evolution of mosaic operons by horizontal gene transfer and gene displacement in situ , 2003, Genome Biology.

[9]  C. Elkan,et al.  Unsupervised Learning of Multiple Motifs in Biopolymers Using Expectation Maximization , 2004, Machine Learning.

[10]  J. W. Campbell,et al.  Experimental Determination and System Level Analysis of Essential Genes in Escherichia coli MG1655 , 2003, Journal of bacteriology.

[11]  Kenta Nakai,et al.  BTBS: database of transcriptional regulation in Bacillus subtilis and its contribution to comparative genomics , 2004, Nucleic Acids Res..

[12]  J. L. Cherry Genome size and operon content. , 2003, Journal of theoretical biology.

[13]  R. Overbeek,et al.  The use of gene clusters to infer functional coupling. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Thomas L. Madden,et al.  Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. , 2001, Nucleic acids research.

[15]  J. Bouché,et al.  Characterization and properties of very large inversions of the E. coli chromosome along the origin-to-terminus axis , 2004, Molecular and General Genetics MGG.

[16]  Hanah Margalit,et al.  Chromosomal organization is shaped by the transcription regulatory network. , 2005, Trends in genetics : TIG.

[17]  G. Church,et al.  A motif co-occurrence approach for genome-wide prediction of transcription-factor-binding sites in Escherichia coli. , 2004, Genome research.

[18]  A. Valencia,et al.  Analysis of the Cellular Functions of Escherichia coli Operons and Their Conservation in Bacillus subtilis , 2002, Journal of Molecular Evolution.

[19]  Warren C. Lathe,et al.  Predicting protein function by genomic context: quantitative evaluation and qualitative inferences. , 2000, Genome research.

[20]  Jacques Monod,et al.  On the Regulation of Gene Activity , 1961 .

[21]  David M. Hillis,et al.  Faculty Opinions recommendation of From gene trees to organismal phylogeny in prokaryotes: the case of the gamma-Proteobacteria. , 2003 .

[22]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[23]  Kenta Nakai,et al.  Prediction of co-regulated genes in Bacillus subtilis on the basis of upstream elements conserved across three closely related species , 2001, Genome Biology.

[24]  Jeremy D. Glasner,et al.  Genome-Scale Analysis of the Uses of the Escherichia coli Genome: Model-Driven Analysis of Heterogeneous Data Sets , 2003, Journal of bacteriology.

[25]  P. Bork,et al.  Analysis of genomic context: prediction of functional associations from conserved bidirectionally transcribed gene pairs , 2004, Nature Biotechnology.

[26]  J. Roth,et al.  Selection and endpoint distribution of bacterial inversion mutations. , 1983, Genetics.

[27]  H. Ochman,et al.  Molecular archaeology of the Escherichia coli genome. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[28]  G. W. Hatfield,et al.  Valyl-tRNA synthetase gene of Escherichia coli K12. Molecular genetic characterization. , 1988, The Journal of biological chemistry.

[29]  S. Salzberg,et al.  Prediction of operons in microbial genomes. , 2001, Nucleic acids research.

[30]  Chiara Sabatti,et al.  Co-expression pattern from DNA microarray experiments as a tool for operon prediction , 2002, Nucleic Acids Res..

[31]  John B. Anderson,et al.  CDD: a curated Entrez database of conserved domain alignments , 2003, Nucleic Acids Res..

[32]  J R Roth,et al.  Selfish operons: horizontal transfer may drive the evolution of gene clusters. , 1996, Genetics.

[33]  C. Lawrence,et al.  Factors influencing the identification of transcription factor binding sites by cross-species comparison. , 2002, Genome research.

[34]  A. Emili,et al.  Interaction network containing conserved and essential protein complexes in Escherichia coli , 2005, Nature.

[35]  H. Mori,et al.  Evolutionary instability of operon structures disclosed by sequence comparisons of complete microbial genomes. , 1999, Molecular biology and evolution.

[36]  J. Lawrence,et al.  Selfish operons: the evolutionary impact of gene clustering in prokaryotes and eukaryotes. , 1999, Current opinion in genetics & development.

[37]  G. Church,et al.  A comprehensive library of DNA-binding site matrices for 55 proteins applied to the complete Escherichia coli K-12 genome. , 1998, Journal of molecular biology.

[38]  David Botstein,et al.  The Stanford Microarray Database: data access and quality assessment tools , 2003, Nucleic Acids Res..

[39]  Peter D. Karp,et al.  The EcoCyc Database , 2002, Nucleic Acids Res..

[40]  H. Ochman,et al.  Bacterial genomes as new gene homes: the genealogy of ORFans in E. coli. , 2004, Genome research.

[41]  A. Rodrigo,et al.  Likelihood-based tests of topologies in phylogenetics. , 2000, Systematic biology.

[42]  B. Snel,et al.  Conservation of gene order: a fingerprint of proteins that physically interact. , 1998, Trends in biochemical sciences.

[43]  Temple F. Smith,et al.  Operons in Escherichia coli: genomic analyses and predictions. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[44]  Michael Y. Galperin,et al.  The COG database: new developments in phylogenetic classification of proteins from complete genomes , 2001, Nucleic Acids Res..

[45]  W Arber,et al.  Genomic evolution during a 10,000-generation experiment with bacteria. , 1999, Proceedings of the National Academy of Sciences of the United States of America.