Universal Internucleotide Statistics in Full Genomes: A Footprint of the DNA Structure and Packaging?

Uncovering the fundamental laws that govern the complex DNA structural organization remains challenging and is largely based upon reconstructions from the primary nucleotide sequences. Here we investigate the distributions of the internucleotide intervals and their persistence properties in complete genomes of various organisms from Archaea and Bacteria to H. Sapiens aiming to reveal the manifestation of the universal DNA architecture. We find that in all considered organisms the internucleotide interval distributions exhibit the same -exponential form. While in prokaryotes a single -exponential function makes the best fit, in eukaryotes the PDF contains additionally a second -exponential, which in the human genome makes a perfect approximation over nearly 10 decades. We suggest that this functional form is a footprint of the heterogeneous DNA structure, where the first -exponential reflects the universal helical pitch that appears both in pro- and eukaryotic DNA, while the second -exponential is a specific marker of the large-scale eukaryotic DNA organization.

[1]  A. Bunde,et al.  On the Occurence of Extreme Events in Long-term Correlated and Multifractal Data Sets , 2008 .

[2]  D. Goodsell,et al.  Bending and curvature calculations in B-DNA. , 1994, Nucleic acids research.

[3]  Alain Arneodo,et al.  Long-Range Correlations in Genomic DNA , 2001 .

[4]  H. Kantz,et al.  Recurrence time analysis, long-term correlations, and extreme events. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[5]  Armin Bunde,et al.  Universal behaviour of interoccurrence times between losses in financial markets: An analytical description , 2011 .

[6]  T. Pandita,et al.  Chromatin remodeling finds its place in the DNA double-strand break response , 2009, Nucleic acids research.

[7]  E. M. Nifontov,et al.  Statistics of return intervals between long heartbeat intervals and their usability for online prediction of disorders , 2009 .

[8]  E. Bacry,et al.  Characterizing long-range correlations in DNA sequences from wavelet analysis. , 1995, Physical review letters.

[9]  J. G. Snijders,et al.  Hydrogen Bonding in DNA Base Pairs: Reconciliation of Theory and Experiment , 2000 .

[10]  C. Antonopoulos,et al.  Evidence of q-exponential statistics in Greek seismicity , 2014, 1405.4414.

[11]  S. Havlin,et al.  Detecting long-range correlations with detrended fluctuation analysis , 2001, cond-mat/0102214.

[12]  L. Shapiro,et al.  Bacterial chromosome organization and segregation. , 2010, Cold Spring Harbor perspectives in biology.

[13]  K. Dill,et al.  A maximum entropy framework for nonexponential distributions , 2013, Proceedings of the National Academy of Sciences.

[14]  A. Smit Interspersed repeats and other mementos of transposable elements in mammalian genomes. , 1999, Current opinion in genetics & development.

[15]  J. Jurka Repbase update: a database and an electronic journal of repetitive elements. , 2000, Trends in genetics : TIG.

[16]  A. Arneodo,et al.  Influence of the sequence on elastic properties of long DNA chains. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[17]  Shlomo Havlin,et al.  Crumpled globule model of the three-dimensional structure of DNA , 1993 .

[18]  A. Arneodo,et al.  Thermodynamics of DNA loops with long-range correlated structural disorder. , 2005, Physical review letters.

[19]  C. Tsallis Introduction to Nonextensive Statistical Mechanics: Approaching a Complex World , 2009 .

[20]  J. Muzy,et al.  Long-range correlations in genomic DNA: a signature of the nucleosomal structure. , 2001, Physical review letters.

[21]  C. Tsallis,et al.  Nonlinear relativistic and quantum equations with a common type of solution. , 2011, Physical review letters.

[22]  A. Bunde,et al.  Universal behavior of the interoccurrence times between losses in financial markets: independence of the time resolution. , 2014, Physical review. E, Statistical, nonlinear, and soft matter physics.

[23]  C. Peng,et al.  Long-range correlations in nucleotide sequences , 1992, Nature.

[24]  A A Moreira,et al.  Thermostatistics of overdamped motion of interacting particles. , 2010, Physical review letters.

[25]  E K Lenzi,et al.  q-exponential distribution in urban agglomeration. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[26]  C. A. Chatzidimitriou-Dreismann,et al.  Long-range correlations in DNA , 1993, Nature.

[27]  T. Pandita,et al.  Evidence of a chromatin basis for increased mutagen sensitivity associated with multiple primary malignancies of the head and neck , 1995, International journal of cancer.

[28]  C. Tsallis Possible generalization of Boltzmann-Gibbs statistics , 1988 .

[29]  Armin Bunde,et al.  Effect of nonlinear correlations on the statistics of return intervals in multifractal data sets. , 2007, Physical review letters.

[30]  M. Groudine,et al.  Controlling the double helix , 2003, Nature.

[31]  Alain Arneodo,et al.  Long-range correlations between DNA bending sites: relation to the structure and dynamics of nucleosomes. , 2002, Journal of molecular biology.

[32]  T. Pandita,et al.  Histone Modifications and DNA Double-Strand Break Repair after Exposure to Ionizing Radiations , 2013, Radiation research.

[33]  C. Peng,et al.  Mosaic organization of DNA nucleotides. , 1994, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[34]  Wentian Li,et al.  Long-range correlation and partial 1/fα spectrum in a noncoding DNA sequence , 1992 .

[35]  Françoise Argoul,et al.  Multi-scale coding of genomic information: From DNA sequence to genome structure and function , 2011 .

[36]  Armin Bunde,et al.  Eliminating finite-size effects and detecting the amount of white noise in short records with long-term memory. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[37]  Shlomo Havlin,et al.  Long-term memory: a natural mechanism for the clustering of extreme events and anomalous residual times in climate records. , 2005, Physical review letters.

[38]  Grace Jordison Molecular Biology of the Gene , 1965, The Yale Journal of Biology and Medicine.

[39]  Gregor E. Morfill,et al.  Plasma medicine: an introductory review , 2009 .

[40]  R. Mantegna,et al.  Long-range correlation properties of coding and noncoding DNA sequences: GenBank analysis. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.