Quantitative measure of randomness and order for complete genomes.

We propose an order index, phi, which gives a quantitative measure of randomness and order of complete genomic sequences. It maps genomes to a number from 0 (random and of infinite length) to 1 (fully ordered) and applies regardless of sequence length. The 786 complete genomic sequences in GenBank were found to have phi values in a very narrow range, phig=0.031(-0.015)+0.028. We show this implies that genomes are halfway toward being completely random, or, at the "edge of chaos." We further show that artificial "genomes" converted from literary classics have phi 's that almost exactly coincide with phig, but sequences of low information content do not. We infer that phig represents a high information-capacity "fixed point" in sequence space, and that genomes are driven to it by the dynamics of a robust growth and evolution process. We show that a growth process characterized by random segmental duplication can robustly drive genomes to the fixed point.

[1]  C. Glover,et al.  Gene expression profiling for hematopoietic cell culture , 2006 .

[2]  Li-Ching Hsieh,et al.  Shannon information and self-similarity in whole genomes , 2005, Comput. Phys. Commun..

[3]  Liaofu Luo,et al.  Shannon information in complete genomes , 2005, Proceedings. 2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB 2004..

[4]  Michael Lässig,et al.  Solvable sequence evolution models and genomic correlations. , 2005, Physical review letters.

[5]  Chang-Heng Chang,et al.  Divergence and Shannon information in genomes. , 2004, Physical review letters.

[6]  Wen-Hsiung Li,et al.  Patterns of segmental duplication in the human genome. , 2004, Molecular biology and evolution.

[7]  M. Adams,et al.  Recent Segmental Duplications in the Human Genome , 2002, Science.

[8]  Michael Lynch,et al.  Gene Duplication and Evolution , 2002, Science.

[9]  Martin J. Lercher,et al.  Clustering of housekeeping genes provides a unified model of gene order in the human genome , 2002, Nature Genetics.

[10]  Pierre Baldi,et al.  Distribution patterns of over-represented k-mers in non-coding yeast DNA , 2002, Bioinform..

[11]  K. Lau,et al.  Measure representation and multifractal analysis of complete genomes. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[12]  H. Kuwata,et al.  Evolution of the periodicity and the self-similarity in DNA sequence: a Fourier transform analysis. , 2001, The Japanese journal of physiology.

[13]  A. Nekrutenko,et al.  Assessment of compositional heterogeneity within and between eukaryotic genomes. , 2000, Genome research.

[14]  P. Cohen,et al.  Specificity and mechanism of action of some commonly used protein kinase inhibitors , 2000 .

[15]  B. Hao,et al.  Fractals related to long DNA sequences and complete genomes , 2000 .

[16]  Ramón Román-Roldán,et al.  DECOMPOSITION OF DNA SEQUENCE COMPLEXITY , 1999 .

[17]  P. Deschavanne,et al.  Genomic signature: characterization and classification of species assessed by chaos game representation of sequences. , 1999, Molecular biology and evolution.

[18]  Howard Ochman,et al.  Isochores result from mutation not selection , 1999, Nature.

[19]  D. Forsdyke,et al.  Accounting units in DNA. , 1999, Journal of theoretical biology.

[20]  S. Salzberg,et al.  Skewed oligomers and origins of replication. , 1998, Gene.

[21]  Xin Lu,et al.  Characterizing self-similarity in bacteria DNA sequences , 1998 .

[22]  S. Karlin,et al.  Strand compositional asymmetry in bacterial and large viral genomes. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Temple F. Smith,et al.  Patterns of Genome Organization in Bacteria , 1998, Science.

[24]  J. Lobry Asymmetric substitution patterns in the two DNA strands of bacteria. , 1996, Molecular biology and evolution.

[25]  Chan,et al.  Can Zipf distinguish language from noise in noncoding DNA? , 1996, Physical review letters.

[26]  H E Stanley,et al.  Linguistic features of noncoding DNA sequences. , 1994, Physical review letters.

[27]  V. Prabhu Symmetry observations in long nucleotide sequences. , 1993, Nucleic acids research.

[28]  A. Goldberger,et al.  Finite-size effects on long-range correlations: implications for analyzing DNA sequences. , 1993, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[29]  James P. Crutchfield,et al.  Revisiting the Edge of Chaos: Evolving Cellular Automata to Perform Computations , 1993, Complex Syst..

[30]  W. Li,et al.  Statistical tests of neutrality of mutations. , 1993, Genetics.

[31]  R. Voss,et al.  Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. , 1992, Physical review letters.

[32]  Wentian Li,et al.  Long-range correlation and partial 1/fα spectrum in a noncoding DNA sequence , 1992 .

[33]  Christopher G. Langton,et al.  Computation at the edge of chaos: Phase transitions and emergent computation , 1990 .

[34]  W. H. Zurek Complexity, Entropy and the Physics of Information , 1990 .

[35]  G Bernardi,et al.  The mosaic genome of warm-blooded vertebrates. , 1985, Science.

[36]  M. Kimura A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences , 1980, Journal of Molecular Evolution.

[37]  E. Chargaff,et al.  Separation of B. subtilis DNA into complementary strands. 3. Direct analysis. , 1968, Proceedings of the National Academy of Sciences of the United States of America.

[38]  Giorgio Bernardi,et al.  Structural and evolutionary genomics : natural selection in genome evolution , 2004 .

[39]  P. Pevzner,et al.  Genome rearrangements in mammalian evolution: lessons from human and mouse genomes. , 2003, Genome research.

[40]  H E Stanley,et al.  Scaling features of noncoding DNA. , 1999, Physica A.

[41]  Arantxa Etxeverria The Origins of Order , 1993 .

[42]  Stuart A. Kauffman,et al.  The origins of order , 1993 .

[43]  A. Karimi,et al.  Master's Thesis , 2008 .