Multifractal analysis and feature extraction of DNA sequences

This paper presents feature extraction and estimations of multifractal measures for deoxyribonucleic acid (DNA) sequences, and demonstrates the intriguing possibility of identifying biological functionality using information contained within the DNA sequence. We have developed a technique that seeks patterns or correlations in the DNA sequence at a higher level. The technique has three main steps: (i) transforms the DNA sequence symbols into a modified Lévy walk, (ii) transforms the Lévy walk into a signal spectrum, and (iii) breaks the spectrum into subspectra and treats each of these as an attractor from which the multifractal dimension spectrum is estimated. An optimal minimum window size and volume element size are found for estimation of the multifractal measures. Experimental results show that DNA is a multifractal, and that the multifractality changes depending upon the location (coding or noncoding region) in the sequence.

[1]  Temple F. Smith,et al.  Prediction of gene structure. , 1992, Journal of molecular biology.

[2]  Michael R. Hayden,et al.  The prediction of exons through an analysis of spliceable open reading frames , 1992, Nucleic Acids Res..

[3]  B. Kendall Nonlinear Dynamics and Chaos , 2001 .

[4]  Ronald W. Shonkwiler Mathematical Biology: An Introduction with Maple and Matlab , 2009 .

[5]  Steven Salzberg,et al.  Finding Genes in DNA with a Hidden Markov Model , 1997, J. Comput. Biol..

[6]  H. J. Jeffrey Chaos game representation of gene structure. , 1990, Nucleic acids research.

[7]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[8]  J. Muzy,et al.  Long-range correlations in genomic DNA: a signature of the nucleosomal structure. , 2001, Physical review letters.

[9]  E. Snyder,et al.  Identification of coding regions in genomic DNA sequences: an application of dynamic programming and neural networks. , 1993, Nucleic acids research.

[10]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[11]  B. Mandelbrot Fractal Geometry of Nature , 1984 .

[12]  V. Solovyev,et al.  Predicting internal exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames. , 1994, Nucleic acids research.

[13]  Benoit B. Mandelbrot,et al.  Fractal Geometry of Nature , 1984 .

[14]  Steven H. Strogatz,et al.  Nonlinear Dynamics and Chaos , 2024 .

[15]  R. Voss,et al.  Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. , 1992, Physical review letters.

[16]  C. Peng,et al.  Long-range correlations in nucleotide sequences , 1992, Nature.

[17]  Grosberg AYu,et al.  Fractality of DNA texts. , 1994, Journal of biomolecular structure & dynamics.

[18]  Zu-Guo Yu,et al.  Time Series Model Based on Global Structure of Complete Genome , 2001 .

[19]  Witold Kinsner,et al.  A unified approach to fractal dimensions , 2005, Fourth IEEE Conference on Cognitive Informatics, 2005. (ICCI 2005)..

[20]  W Bains Local self-similarity of sequence in mammalian nuclear DNA is modulated by a 180 bp periodicity. , 1993, Journal of theoretical biology.

[21]  R. Linsker,et al.  A measure of DNA periodicity. , 1986, Journal of theoretical biology.

[22]  Yingxu Wang,et al.  On Cognitive Informatics , 2002, Proceedings First IEEE International Conference on Cognitive Informatics.

[23]  J Xu,et al.  Fractal dimension of exon and intron sequences. , 1995, Journal of theoretical biology.

[24]  R. Guigó,et al.  Evaluation of gene structure prediction programs. , 1996, Genomics.

[25]  Hartmut Jürgens,et al.  Chaos and Fractals: New Frontiers of Science , 1992 .

[26]  S Karlin,et al.  Patchiness and correlations in DNA sequences , 1993, Science.

[27]  Timothy B. Stockwell,et al.  The Sequence of the Human Genome , 2001, Science.

[28]  J. Yorke,et al.  Chaos: An Introduction to Dynamical Systems , 1997 .

[29]  Werner Ebeling,et al.  Entropy, complexity, predictability, and data analysis of time series and letter sequences , 2002 .

[30]  EUGENE HAMORI,et al.  Novel DNA sequence representations , 1985, Nature.

[31]  B. Wang,et al.  Correlation property of length sequences based on global structure of the complete genome. , 2000, Physical review. E, Statistical, nonlinear, and soft matter physics.

[32]  R. Gregory,et al.  The modulation of DNA content: proximate causes and ultimate consequences. , 1999, Genome research.

[33]  D Larhammar,et al.  Biological origins of long-range correlations and compositional variations in DNA. , 1993, Nucleic acids research.

[34]  Richard F. Voss 1/f Noise and Fractals in DNA-base Sequences , 1993 .

[35]  Alain Arneodo,et al.  Long-Range Correlations in Genomic DNA , 2001 .

[36]  P. Bernaola-Galván,et al.  Compositional segmentation and long-range fractal correlations in DNA sequences. , 1996, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[37]  S. Buldyrev,et al.  Species independence of mutual information in coding and noncoding DNA. , 2000, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[38]  E. Uberbacher,et al.  Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[39]  R. Voss,et al.  ‘1/fnoise’ in music and speech , 1975, Nature.

[40]  H. J. Jeffrey Chaos game representation of gene structure. , 1990, Nucleic acids research.