Wavelet based fractal analysis of DNA sequences

Abstract The fractal scaling properties of DNA sequences are analyzed using the wavelet transform. Mapping nucleotide sequences onto a “DNA walk” produces fractal landscapes that can be studied quantitatively by applying the so-called wavelet transform modulus maxima method. This method provides a natural generalization of the classical box-counting techniques to fractal signals, the wavelets playing the role of “generalized oscillating boxes”. From the scaling behavior of partition functions that are defined from the wavelet transform modulus maxima, this method allows us to determine the singularity spectrum of the considered signal and thereby to achieve a complete multifractal analysis. Moreover, by considering analyzing wavelets that make the “wavelet transform microscope” blind to “patches” of different nucleotide composition that are observed in genomic sequences, we demonstrate and quantify the existence of long-range correlations in the noncoding regions. Although the fluctuations in the patchy landscape of the DNA walks reconstructed from both noncoding and (protein) coding regions are found homogeneous with Gaussian statistics, our wavelet-based analysis allows us to discriminate unambiguously between the fluctuations of the former which behave like fractional Brownian motions, from those of the latter which cannot be distinguished from uncorrelated random Brownian walks. We discuss the robustness of these results with respect to various legitimate codings of the DNA sequences. Finally, we comment about the possible understanding of the origin of the observed long-range correlations in noncoding DNA sequences in terms of the nonequilibrium dynamical processes that produce the “isochore structre of the genome”.

[1]  Berthelsen,et al.  Effective multifractal spectrum of a random walk. , 1994, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[2]  E. Bacry,et al.  Wavelet Analysis of Fractal Signals Application to Fully Developed Turbulence Data , 1993 .

[3]  R. Voss,et al.  Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. , 1992, Physical review letters.

[4]  Simons,et al.  Long-range fractal correlations in DNA. , 1993, Physical Review Letters.

[5]  M. Raghavachari,et al.  Turbulence and Stochastic Processes: Kobnogorov's Ideas 50 Years On , 1993 .

[6]  S Karlin,et al.  Patchiness and correlations in DNA sequences , 1993, Science.

[7]  A. Goldberger,et al.  Finite-size effects on long-range correlations: implications for analyzing DNA sequences. , 1993, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[8]  C. Peng,et al.  Long-range correlations in nucleotide sequences , 1992, Nature.

[9]  Wentian Li,et al.  Understanding long-range correlations in DNA sequences , 1994, chao-dyn/9403002.

[10]  Argoul,et al.  Wavelet analysis of the self-similarity of diffusion-limited aggregates and electrodeposition clusters. , 1990, Physical review. A, Atomic, molecular, and optical physics.

[11]  A. Kolmogorov A refinement of previous hypotheses concerning the local structure of turbulence in a viscous incompressible fluid at high Reynolds number , 1962, Journal of Fluid Mechanics.

[12]  M. Holschneider On the wavelet transformation of fractal objects , 1988 .

[13]  Marcel Lesieur,et al.  Turbulence and Coherent Structures , 1991 .

[14]  Azbel' Universality in a DNA statistical structure. , 1995, Physical review letters.

[15]  Ebeling,et al.  Entropies of biosequences: The role of repeats. , 1994, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[16]  Vicsek,et al.  Multifractality of self-affine fractals. , 1991, Physical review. A, Atomic, molecular, and optical physics.

[17]  Luciano Pietronero,et al.  FRACTALS IN PHYSICS , 1990 .

[18]  U. Frisch,et al.  Wavelet transforms of self-similar processes , 1991 .

[19]  W. Jelinek,et al.  Repetitive sequences in eukaryotic DNA and their expression. , 1982, Annual review of biochemistry.

[20]  J. Yorke,et al.  Dimension of chaotic attractors , 1982 .

[21]  E. Bacry,et al.  Singularity spectrum of fractal signals from wavelet analysis: Exact results , 1993 .

[22]  Bacry,et al.  Oscillating singularities in locally self-similar functions. , 1995, Physical review letters.

[23]  C. Peng,et al.  Mosaic organization of DNA nucleotides. , 1994, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[24]  Wentian Li,et al.  Long-range correlation and partial 1/fα spectrum in a noncoding DNA sequence , 1992 .

[25]  P. Grassberger,et al.  Scaling laws for invariant measures on hyperbolic and nonhyperbolic atractors , 1988 .

[26]  G Bernardi,et al.  The mosaic genome of warm-blooded vertebrates. , 1985, Science.

[27]  C. Meneveau,et al.  The multifractal nature of turbulent energy dissipation , 1991, Journal of Fluid Mechanics.

[28]  Y. Gagne,et al.  Velocity probability density functions of high Reynolds number turbulence , 1990 .

[29]  R. Mantegna,et al.  Long-range correlation properties of coding and noncoding DNA sequences: GenBank analysis. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[30]  B. Mandelbrot,et al.  Fractals: Form, Chance and Dimension , 1978 .

[31]  Romain Murenzi,et al.  Wavelet Transform of Fractal Aggregates , 1989 .

[32]  R. Benzi,et al.  Wavelet analysis of a Gaussian Kolmogorov signal , 1993 .

[33]  L Verkh,et al.  Statistical analysis of DNA sequences. II. , 1982, Biopolymers.

[34]  The character of the correlations in DNA sequences , 1994 .

[35]  C. A. Chatzidimitriou-Dreismann,et al.  Long-range correlations in DNA , 1993, Nature.

[36]  Steven A. Orszag,et al.  Turbulence: Challenges for Theory and Experiment , 1990 .

[37]  E. Bacry,et al.  Wavelets and multifractal formalism for singular signals: Application to turbulence data. , 1991, Physical review letters.

[38]  E. Bacry,et al.  Characterizing long-range correlations in DNA sequences from wavelet analysis. , 1995, Physical review letters.

[39]  H. Stanley,et al.  On growth and form : fractal and non-fractal patterns in physics , 1986 .

[40]  T. Vicsek,et al.  Dynamics of fractal surfaces , 1991 .

[41]  E. Bacry,et al.  Solving the Inverse Fractal Problem from Wavelet Analysis , 1994 .

[42]  Yves Meyer,et al.  Progress in wavelet analysis and applications , 1993 .

[43]  Yves Meyer,et al.  Wavelets and Applications , 1992 .

[44]  M. Goodchild Fractals and the accuracy of geographical measures , 1980 .

[45]  Emmanuel Bacry,et al.  THE THERMODYNAMICS OF FRACTALS REVISITED WITH WAVELETS , 1995 .

[46]  Michael Ghil,et al.  Turbulence and predictability in geophysical fluid dynamics and climate dynamics , 1985 .

[47]  M. Gates A simple way to look at DNA. , 1986, Journal of theoretical biology.

[48]  B. Mandelbrot Intermittent turbulence in self-similar cascades: divergence of high moments and dimension of the carrier , 1974, Journal of Fluid Mechanics.

[49]  Pierre Gilles Lemarié,et al.  Les Ondelettes en 1989 , 1990 .

[50]  T. Vicsek Fractal Growth Phenomena , 1989 .

[51]  Jensen,et al.  Erratum: Fractal measures and their singularities: The characterization of strange sets , 1986, Physical review. A, General physics.

[52]  B. Mandelbrot,et al.  Fractional Brownian Motions, Fractional Noises and Applications , 1968 .

[53]  Pierre Collet,et al.  The dimension spectrum of some dynamical systems , 1987 .

[54]  G. Churchill Stochastic models for heterogeneous DNA sequences. , 1989, Bulletin of mathematical biology.

[55]  G. A. Edgar Measure, Topology, and Fractal Geometry , 1990 .

[56]  Branko Borštnik,et al.  Analysis of Apparent 1/fα Spectrum in DNA Sequences , 1993 .

[57]  David R. Wolf,et al.  Base compositional structure of genomes. , 1992, Genomics.

[58]  Benoit B. Mandelbrot,et al.  Fractal Geometry of Nature , 1984 .

[59]  M. Ya. Azbel,et al.  Random Two-Component One-Dimensional Ising Model for Heteropolymer Melting , 1973 .

[60]  Lewis F. Richardson,et al.  Weather Prediction by Numerical Process , 1922 .

[61]  Stéphane Mallat,et al.  Singularity detection and processing with wavelets , 1992, IEEE Trans. Inf. Theory.

[62]  A. Grossmann,et al.  Cycle-octave and related transforms in seismic signal analysis , 1984 .

[63]  Charles K. Chui,et al.  An Introduction to Wavelets , 1992 .

[64]  Wen-Hsiung Li,et al.  Fundamentals of molecular evolution , 1990 .

[65]  Heinz-Otto Peitgen,et al.  The science of fractal images , 2011 .

[66]  D Larhammar,et al.  Biological origins of long-range correlations and compositional variations in DNA. , 1993, Nucleic acids research.

[67]  Jean-Paul Bonnet,et al.  Eddy structure identification in free turbulent shear flows : selected papers from the IUTAM symposium entitled: "Eddy structures identification in free turbulent shear flows," Poitiers, France, 12-14 October 1992 , 1993 .

[68]  Shlomo Havlin,et al.  Crumpled globule model of the three-dimensional structure of DNA , 1993 .

[69]  A. Arneodo,et al.  Wavelet transform of multifractals. , 1988, Physical review letters.

[70]  S. Nee,et al.  Uncorrelated DNA walks , 1992, Nature.

[71]  R. F. Voss Random fractals: self-affinity in noise, music, mountains, and clouds , 1989 .

[72]  C. Peng,et al.  Fractal landscapes and molecular evolution: modeling the myosin heavy chain gene family. , 1993, Biophysical journal.

[73]  Volker Brendel,et al.  Gnomic : a dictionary of genetic codes , 1986 .

[74]  Argoul,et al.  Structural analysis of electroless deposits in the diffusion-limited regime. , 1994, Physical review letters.

[75]  E. Bacry,et al.  The Multifractal Formalism Revisited with Wavelets , 1994 .

[76]  P. Levy Processus stochastiques et mouvement brownien , 1948 .

[77]  Ingrid Daubechies,et al.  Ten Lectures on Wavelets , 1992 .

[78]  Luo Liao-fu,et al.  Fractal dimension of nucleic acid sequences and its relation to evolutionary level , 1988 .

[79]  Paul Meakin,et al.  Growth Patterns in Physical Sciences and Biology , 1993 .

[80]  E N Trifonov,et al.  The multiple codes of nucleotide sequences. , 1989, Bulletin of mathematical biology.

[81]  A. Klug,et al.  Structure of the nucleosome core particle at 7 Å resolution , 1984, Nature.

[82]  E. Bacry,et al.  Multifractal formalism for fractal signals: The structure-function approach versus the wavelet-transform modulus-maxima method. , 1993, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[83]  Patrick Tabeling,et al.  Turbulence : a tentative dictionary , 1994 .

[84]  Mitchell J. Feigenbaum Some characterizations of strange sets , 1987 .

[85]  Skolnick,et al.  Global fractal dimension of human DNA sequences treated as pseudorandom walks. , 1992, Physical review. A, Atomic, molecular, and optical physics.

[86]  S. Zucker,et al.  Evaluating the fractal dimension of profiles. , 1989, Physical review. A, General physics.

[87]  R. Britten,et al.  Repeated Sequences in DNA , 1968 .

[88]  H. J. Jeffrey Chaos game representation of gene structure. , 1990, Nucleic acids research.

[89]  Argoul,et al.  Golden mean arithmetic in the fractal branching of diffusion-limited aggregates. , 1992, Physical review letters.

[90]  T. Vicsek,et al.  Fractals in natural sciences , 1994 .

[91]  J. Fickett Recognition of protein coding regions in DNA sequences. , 1982, Nucleic acids research.

[92]  F. Anselmet,et al.  High-order velocity structure functions in turbulent shear flows , 1984, Journal of Fluid Mechanics.