Evidence for growth of microbial genomes by short segmental duplications

Textual analysis of microbial genomes reveals footprints of their early evolution of the genomes. It is shown that distributions frequency occurrence of words less than nine letters in genomes have widths that are many times those of Poisson distributions. This phenomenon suggests a simple biologically plausible model for the growth of genomes: the genome first grows randomly to an initial length of approximately one thousand nucleotides (1 kb), or about one thousandth of its final length, thereafter mainly grows by random short segmental duplication. We show that using duplicated segments averaging around 25 b, model sequences generated in this model possess statistical properties characteristic of present day genomes. Both the initial length and the duplicated segment length support an RNA world at the time duplication began.

[1]  L. Orgel Evolution of the genetic apparatus. , 1968, Journal of molecular biology.

[2]  S. Karlin,et al.  Dinucleotide relative abundance extremes: a genomic signature. , 1995, Trends in genetics : TIG.

[3]  C. Woese The universal ancestor. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[4]  J W Szostak,et al.  Structurally complex and highly active RNA ligases derived from random RNA sequences. , 1995, Science.

[5]  Liaofu Luo,et al.  Minimal model for genome evolution and growth. , 2002, Physical review letters.

[6]  S. Karlin,et al.  Frequent oligonucleotides and peptides of the Haemophilus influenzae genome. , 1996, Nucleic acids research.

[7]  S. Salzberg,et al.  Complete genome sequence of Treponema pallidum, the syphilis spirochete. , 1998, Science.

[8]  Francis Crick,et al.  The Genetic Code , 1962 .

[9]  T. Colbert,et al.  Genomics, Chi sites and codons: 'islands of preferred DNA pairing' are oceans of ORFs. , 1998, Trends in genetics : TIG.

[10]  F. H. C. CRICK,et al.  Origin of the Genetic Code , 1967, Nature.

[11]  S. Otto,et al.  The evolution of gene duplicates. , 2002, Advances in genetics.

[12]  M. Lynch,et al.  The evolutionary fate and consequences of duplicate genes. , 2000, Science.

[13]  G. F. Joyce The antiquity of RNA-based evolution , 2002, Nature.

[14]  J E Darnell,et al.  Speculations on the early course of evolution. , 1986, Proceedings of the National Academy of Sciences of the United States of America.

[15]  S. Salzberg,et al.  DNA sequence of both chromosomes of the cholera pathogen Vibrio cholerae , 2000, Nature.

[16]  R. Tritz,et al.  RNA catalytic properties of the minimum (-)sTRSV sequence. , 1989, Biochemistry.

[17]  Duplication duplication. , 1979, Postgraduate medicine.

[18]  S. Salzberg,et al.  DNA uptake signal sequences in naturally transformable bacteria. , 1999, Research in microbiology.

[19]  B. Ganem RNA world , 1987, Nature.

[20]  R. Symons,et al.  Self-cleavage of plus and minus RNAs of a virusoid and a structural model for the active sites , 1987, Cell.

[21]  Alain F. Corcos,et al.  The Evolution of Genetics , 1965 .

[22]  The Arabidopsis Genome Initiative Analysis of the genome sequence of the flowering plant Arabidopsis thaliana , 2000, Nature.

[23]  S Karlin,et al.  Statistical analyses of counts and distributions of restriction sites in DNA sequences. , 1992, Nucleic acids research.

[24]  D. Labie,et al.  Molecular Evolution , 1991, Nature.

[25]  T. Cech,et al.  In vitro splicing of the ribosomal RNA precursor of tetrahymena: Involvement of a guanosine nucleotide in the excision of the intervening sequence , 1981, Cell.

[26]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[27]  Ronald W. Davis,et al.  Role of duplicate genes in genetic robustness against null mutations , 2003, Nature.

[28]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[29]  R. Fleischmann,et al.  Frequency and distribution of DNA uptake signal sequences in the Haemophilus influenzae Rd genome. , 1995, Science.