Correlations in DNA sequences across the three domains of life

Abstract We report statistical studies of correlation properties of ∼7500 gene sequences, covering coding (exon) and non-coding (intron) sequences for DNA and primary amino acid sequences for proteins, across all three domains of life, namely Eukaryotes (cells with nuclei), Prokaryotes (bacteria) and Archaea (archaebacteria). Mutual information function, power spectrum and Holder exponent analyses show exons with somewhat greater correlation content than the introns studied. These results are further confirmed with hypothesis testing. While ∼30% of the Eukaryote coding sequences show distinct correlations above noise threshold, this is true for only ∼10% of the Prokaryote and Archaea coding sequences. For protein sequences, we observe correlation lengths similar to that of “random” sequences.

[1]  Calyampudi R. Rao,et al.  Linear Statistical Inference and Its Applications. , 1975 .

[2]  Wentian Li,et al.  Long-range correlation and partial 1/fα spectrum in a noncoding DNA sequence , 1992 .

[3]  Wentian Li Mutual information functions versus correlation functions , 1990 .

[4]  C. A. Chatzidimitriou-Dreismann,et al.  Long-range correlations in DNA , 1993, Nature.

[5]  Wentian Li,et al.  Erratum to “Understanding long-range correlations in DNA sequences” [Physica D 75 (1994) 392-416] ☆ , 1995 .

[6]  S. Nee,et al.  Uncorrelated DNA walks , 1992, Nature.

[7]  I. Grosse,et al.  MEASURING CORRELATIONS IN SYMBOL SEQUENCES , 1995 .

[8]  Emmanuel Bacry,et al.  What can we learn with wavelets about DNA sequences , 1998 .

[9]  Kenneth Falconer,et al.  Fractal Geometry: Mathematical Foundations and Applications , 1990 .

[10]  Azbel' Universality in a DNA statistical structure. , 1995, Physical review letters.

[11]  Wentian Li,et al.  GENERATING NONTRIVIAL LONG-RANGE CORRELATIONS AND 1/f SPECTRA BY REPLICATION AND MUTATION , 1992 .

[12]  Emmanuel Bacry,et al.  Wavelet based fractal analysis of DNA sequences , 1996 .

[13]  R. Mantegna,et al.  Statistical mechanics in biology: how ubiquitous are long-range correlations? , 1994, Physica A.

[14]  Eivind Coward,et al.  Equivalence of two Fourier methods for biological sequences , 1997 .

[15]  V. V. Prabhu,et al.  Correlations in intronless DNA , 1992, Nature.

[16]  J. Oliver,et al.  Sequence Compositional Complexity of DNA through an Entropic Segmentation Method , 1998 .

[17]  Branko Borštnik,et al.  Analysis of Apparent 1/fα Spectrum in DNA Sequences , 1993 .

[18]  P. Bernaola-Galván,et al.  Compositional segmentation and long-range fractal correlations in DNA sequences. , 1996, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[19]  R. Mantegna,et al.  Long-range correlation properties of coding and noncoding DNA sequences: GenBank analysis. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[20]  E. Bacry,et al.  Characterizing long-range correlations in DNA sequences from wavelet analysis. , 1995, Physical review letters.

[21]  H. Stanley,et al.  Analysis of DNA sequences using methods of statistical physics , 1998 .

[22]  Hanspeter Herzel,et al.  Correlations in DNA sequences: The role of protein coding segments , 1997 .

[23]  Hanspeter Herzel,et al.  Interpreting correlations in biosequences , 1998 .

[24]  R. Voss,et al.  Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. , 1992, Physical review letters.

[25]  C. Peng,et al.  Long-range correlations in nucleotide sequences , 1992, Nature.

[26]  Calyampudi Radhakrishna Rao,et al.  Linear Statistical Inference and its Applications , 1967 .