Statistical evidence for ancestral correlation patterns

Statistical correlations in DNA sequences are an important source of information for processes of genome evolution. As a special case of such correlations and building up on our previous work, here we study, how short-range correlations in Eukaryotic genomes change under elimination of various classes of repetitive DNA. Our main result is that a residual correlation pattern, common to most mammalian species, emerges under elimination of all repetitive DNA, suggesting features of an ancestral correlation signature. Furthermore, using this general framework, we find classes of repeats, which upon deletion move the correlation pattern towards this residual pattern (simple repeats and SINEs) or away from this residual pattern (LINEs). These findings suggest that the common correlation pattern visible in the mammalian species after repeat elimination can be associated with a common mammalian ancestor.

[1]  Peter A. W. Lewis,et al.  STATIONARY DISCRETE AUTOREGRESSIVE‐MOVING AVERAGE TIME SERIES GENERATED BY MIXTURES , 1983 .

[2]  Liaofu Luo,et al.  Minimal model for genome evolution and growth. , 2002, Physical review letters.

[3]  Wentian Li,et al.  Long-range correlation and partial 1/fα spectrum in a noncoding DNA sequence , 1992 .

[4]  Nicoletta Archidiacono,et al.  Ancestral genomes reconstruction: an integrated, multi-disciplinary approach is needed. , 2006, Genome research.

[5]  Simon Easteal,et al.  Rates of genome evolution and branching order from whole genome analysis. , 2007, Molecular biology and evolution.

[6]  Wentian Li,et al.  An unusual 500, 000 bases long oscillation of guanine and cytosine content in human chromosome 21 , 2004, Comput. Biol. Chem..

[7]  Peter A. W. Lewis,et al.  Discrete time series generated by mixtures III: Autoregressive processes (DAR(p)) , 1978 .

[8]  S Karlin,et al.  Genome-scale compositional comparisons in eukaryotes. , 2001, Genome research.

[9]  R. Gregory The evolution of the genome , 2005 .

[10]  Y. Kohara,et al.  Reconstruction of the vertebrate ancestral genome reveals dynamic genome reorganization in early vertebrates. , 2007, Genome research.

[11]  D. Penny,et al.  Pika and vole mitochondrial genomes increase support for both rodent monophyly and glires. , 2002, Gene.

[12]  Ivo Grosse,et al.  Repeats and correlations in human DNA sequences. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[13]  H. Kazazian Mobile Elements: Drivers of Genome Evolution , 2004, Science.

[14]  Michael Lässig,et al.  Solvable sequence evolution models and genomic correlations. , 2005, Physical review letters.

[15]  J. Murnane,et al.  Use of a mammalian interspersed repetitive (MIR) element in the coding and processing sequences of mammalian genes. , 1995, Nucleic acids research.

[16]  Wentian Li,et al.  Universal 1/f noise, crossovers of scaling exponents, and chromosome-specific patterns of guanine-cytosine content in DNA sequences of the human genome. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[17]  Bernard B. Suh,et al.  Reconstructing contiguous regions of an ancestral genome. , 2006, Genome research.

[18]  François Chapeau-Blondeau,et al.  Autocorrelation versus entropy-based autoinformation for measuring dependence in random signal , 2007 .

[19]  H. Ellegren Microsatellites: simple sequences with complex evolution , 2004, Nature Reviews Genetics.

[20]  Cédric Chauve,et al.  A Methodological Framework for the Reconstruction of Contiguous Regions of Ancestral Genomes and Its Application to Mammalian Genomes , 2008, PLoS Comput. Biol..

[21]  Jean-Nicolas Volff,et al.  Transposable elements as drivers of genomic and biological diversity in vertebrates , 2008, Chromosome Research.

[22]  P. Deschavanne,et al.  Genomic signature: characterization and classification of species assessed by chaos game representation of sequences. , 1999, Molecular biology and evolution.

[23]  J. Craig Venter,et al.  Genome Transplantation in Bacteria: Changing One Species to Another , 2007, Science.

[24]  W. Helm,et al.  A discrete autoregressive process as a model for short-range correlations in DNA sequences , 2003 .

[25]  R. Durrett,et al.  Equilibrium distributions of microsatellite repeat length resulting from a balance between slippage events and point mutations. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Tom H. Pringle,et al.  The human genome browser at UCSC. , 2002, Genome research.

[27]  S Karlin,et al.  Comparisons of eukaryotic genomic sequences. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[28]  B. Mishra,et al.  Models of Genome Evolution , 2004 .

[29]  J. V. Moran,et al.  Mobile elements and mammalian genome evolution. , 2003, Current opinion in genetics & development.

[30]  Marc-Thorsten Hütt,et al.  Informational structure of two closely related eukaryotic genomes. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[31]  J. Qi,et al.  Whole Proteome Prokaryote Phylogeny Without Sequence Alignment: A K-String Composition Approach , 2003, Journal of Molecular Evolution.

[32]  S. Buldyrev,et al.  Species independence of mutual information in coding and noncoding DNA. , 2000, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[33]  William Stafford Noble,et al.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project , 2007, Nature.

[34]  Helen Pearson,et al.  Genetic information: Codes and enigmas , 2006, Nature.

[35]  Hanspeter Herzel,et al.  10-11 bp periodicities in complete genomes reflect protein structure and DNA folding , 1999, Bioinform..

[36]  C. Peng,et al.  Long-range correlations in nucleotide sequences , 1992, Nature.

[37]  Marc-Thorsten Hütt,et al.  Genome Phylogeny Based on Short-Range Correlations in DNA Sequences , 2005, J. Comput. Biol..

[38]  Gaston H. Gonnet,et al.  A Phylogenomic Study of Human, Dog, and Mouse , 2006, PLoS Comput. Biol..

[39]  T. Kunkel DNA Replication Fidelity* , 2004, Journal of Biological Chemistry.

[40]  Marc-Thorsten Hütt,et al.  Information theory reveals large-scale synchronisation of statistical correlations in eukaryote genomes. , 2005, Gene.

[41]  Thierry Heidmann,et al.  LINE-mediated retrotransposition of marked Alu sequences , 2003, Nature Genetics.

[42]  C. A. Hutchinson,et al.  Genome transplantation in bacteria: changing one species to another. , 2007, Nature Reviews Microbiology.

[43]  Ji Qi,et al.  Prokaryote phylogeny without sequence alignment: from avoidance signature to composition distance , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[44]  G. Benson,et al.  Tandem repeats finder: a program to analyze DNA sequences. , 1999, Nucleic acids research.