A Universal Signature in Whole Genomes

Genomes are replete with duplicated sequences in the form of paralogs, transposons, pseudogenes, simple repeats, and others. To understand the origin of this phenomenon we did a systematic study of occurrence frequencies of short words in all extant complete genomes and found a common pattern of duplications in complete genomes so clear and pronounced that it allows all the genomes except one to be placed in a single class expressed by an extremely simple formula. Our analysis including extensive computer simulation in growth of DNA sequences shows that the formation of the class may be attributed to a universal genome growth mechanism in which maximally stochastic segmental duplication is the major mode of growth.

[1]  S. Otto,et al.  The evolution of gene duplicates. , 2002, Advances in genetics.

[2]  S. Karlin,et al.  Comparative DNA analysis across diverse genomes. , 1998, Annual review of genetics.

[3]  S. O’Brien,et al.  The promise of comparative genomics in mammals. , 1999, Science.

[4]  M. Kimura Evolutionary Rate at the Molecular Level , 1968, Nature.

[5]  D. Grant,et al.  Genome organization in dicots: genome duplication in Arabidopsis and synteny between soybean and Arabidopsis. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[6]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[7]  B. Hao,et al.  Fractals related to long DNA sequences and complete genomes , 2000 .

[8]  J. Collado-Vides,et al.  Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. , 1998, Journal of molecular biology.

[9]  D. Ussery,et al.  Three views of microbial genomes. , 1999, Research in microbiology.

[10]  Duplication duplication. , 1979, Postgraduate medicine.

[11]  A. Hughes,et al.  Ancient genome duplications did not structure the human Hox-bearing chromosomes. , 2001, Genome research.

[12]  Huimin Xie,et al.  Visualization of K-tuple distribution in procaryote complete genomes and their randomized counterparts , 2002, Proceedings. IEEE Computer Society Bioinformatics Conference.

[13]  Ronald W. Davis,et al.  Role of duplicate genes in genetic robustness against null mutations , 2003, Nature.

[14]  Jonathan E. Allen,et al.  Genome sequence of the human malaria parasite Plasmodium falciparum , 2002, Nature.

[15]  T. Sicheritz-Pontén,et al.  The genome sequence of Rickettsia prowazekii and the origin of mitochondria , 1998, Nature.

[16]  N. Moran,et al.  50 Million Years of Genomic Stasis in Endosymbiotic Bacteria , 2002, Science.

[17]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[18]  R. Fleischmann,et al.  Frequency and distribution of DNA uptake signal sequences in the Haemophilus influenzae Rd genome. , 1995, Science.

[19]  W. Stemmer,et al.  Genome shuffling leads to rapid phenotypic improvement in bacteria , 2002, Nature.

[20]  Dr. Susumu Ohno Evolution by Gene Duplication , 1970, Springer Berlin Heidelberg.

[21]  T. Jukes,et al.  The neutral theory of molecular evolution. , 2000, Genetics.

[22]  E. Eichler,et al.  Analysis of primate genomic variation reveals a repeat-driven expansion of the human genome. , 2003, Genome research.

[23]  H. Bussemaker,et al.  Building a dictionary for genomes: identification of presumptive regulatory sites by statistical analysis. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Liaofu Luo,et al.  Minimal model for genome evolution and growth. , 2002, Physical review letters.

[25]  B. Barrell,et al.  Complete genome sequence of the model actinomycete Streptomyces coelicolor A3(2) , 2002, Nature.

[26]  M. Lynch,et al.  The evolutionary fate and consequences of duplicate genes. , 2000, Science.

[27]  S. Karlin,et al.  Dinucleotide relative abundance extremes: a genomic signature. , 1995, Trends in genetics : TIG.

[28]  M. Gerstein,et al.  Protein family and fold occurrence in genomes: power-law behaviour and evolutionary model. , 2001, Journal of molecular biology.

[29]  Hidemi Watanabe,et al.  Genome sequence of the endocellular obligate symbiont of tsetse flies, Wigglesworthia glossinidia , 2002, Nature Genetics.