Map-invariant spectral analysis for the identification of DNA periodicities

Many signal processing based methods for finding hidden periodicities in DNA sequences have primarily focused on assigning numerical values to the symbolic DNA sequence and then applying spectral analysis tools such as the short-time discrete Fourier transform (ST-DFT) to locate these repeats. The key results pertaining to this approach are however obtained using a very specific symbolic to numerical map, namely the so-called Voss representation. An important research problem is to therefore quantify the sensitivity of these results to the choice of the symbolic to numerical map. In this article, a novel algebraic approach to the periodicity detection problem is presented and provides a natural framework for studying the role of the symbolic to numerical map in finding these repeats. More specifically, we derive a new matrix-based expression of the DNA spectrum that comprises most of the widely used mappings in the literature as special cases, shows that the DNA spectrum is in fact invariable under all these mappings, and generates a necessary and sufficient condition for the invariance of the DNA spectrum to the symbolic to numerical map. Furthermore, the new algebraic framework decomposes the periodicity detection problem into several fundamental building blocks that are totally independent of each other. Sophisticated digital filters and/or alternate fast data transforms such as the discrete cosine and sine transforms can therefore be always incorporated in the periodicity detection scheme regardless of the choice of the symbolic to numerical map. Although the newly proposed framework is matrix based, identification of these periodicities can be achieved at a low computational cost.

[1]  R. Voss,et al.  Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. , 1992, Physical review letters.

[2]  Dimitris Anastassiou,et al.  Frequency-domain analysis of biomolecular sequences , 2000, Bioinform..

[3]  Sanjit K. Mitra,et al.  Power spectrum analysis for DNA sequences , 2003, Seventh International Symposium on Signal Processing and Its Applications, 2003. Proceedings..

[4]  John M. Butler,et al.  Forensic DNA typing : biology & technology behind STR markers , 2001 .

[5]  R. Guigó,et al.  Evaluation of gene structure prediction programs. , 1996, Genomics.

[6]  Yizhar Lavner,et al.  Gene prediction by spectral rotation measure: a new method for identifying protein-coding regions. , 2003, Genome research.

[7]  P.P. Vaidyanathan,et al.  Digital filters for gene prediction applications , 2002, Conference Record of the Thirty-Sixth Asilomar Conference on Signals, Systems and Computers, 2002..

[8]  E. Ambikairajah,et al.  On DNA Numerical Representations for Period-3 Based Exon Prediction , 2007, 2007 IEEE International Workshop on Genomic Signal Processing and Statistics.

[9]  Michael R. Hayden,et al.  Analysis of Triplet Repeat Disorders , 1998 .

[10]  G. Benson,et al.  Tandem repeats finder: a program to analyze DNA sequences. , 1999, Nucleic acids research.

[11]  Ravi Gupta,et al.  An efficient algorithm to detect palindromes in DNA sequences using periodicity transform , 2006, Signal Process..

[12]  Ivo Grosse,et al.  Repeats and correlations in human DNA sequences. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[13]  Andrzej K. Brodzik,et al.  Symbol-balanced quaternionic periodicity transform for latent pattern detection in DNA sequences , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[14]  P D Cristea Conversion of nucleotides sequences into genomic signals , 2002, Journal of cellular and molecular medicine.

[15]  J. Tuqan,et al.  The Filtered Spectral Rotation Measure , 2006, 2006 Fortieth Asilomar Conference on Signals, Systems and Computers.

[16]  Rappold,et al.  Human Molecular Genetics , 1996, Nature Medicine.

[17]  Mahmood Akhtar,et al.  Time and Frequency Domain Methods for Gene and Exon Prediction in Eukaryotes , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[18]  B. Shea Forensic DNA Typing: Biology and Technology Behind STR Markers , 2002 .

[19]  Jamal Tuqan,et al.  A DSP Approach for Finding the Codon Bias in DNA Sequences , 2008, IEEE Journal of Selected Topics in Signal Processing.

[20]  James W. Fickett,et al.  The Gene Identification Problem: An Overview for Developers , 1995, Comput. Chem..

[21]  Dan Schonfeld,et al.  Nonstationary Analysis of Coding and Noncoding Regions in Nucleotide Sequences , 2008, IEEE Journal of Selected Topics in Signal Processing.

[22]  P. P. Vaidyanathan,et al.  GENE AND EXON PREDICTION USING ALLPASS-BASED FILTERS , 2002 .

[23]  J. Tuqan,et al.  The role of the symbolic-to-numerical mapping in the detection of DNA periodicities , 2008, 2008 IEEE International Workshop on Genomic Signal Processing and Statistics.

[24]  P. Bahr,et al.  Sampling: Theory and Applications , 2020, Applied and Numerical Harmonic Analysis.

[25]  Andreas Antoniou,et al.  Filter-Based Methodology for the Location of Hot Spots in Proteins and Exons in DNA , 2012, IEEE Transactions on Biomedical Engineering.

[26]  R. Linsker,et al.  A measure of DNA periodicity. , 1986, Journal of theoretical biology.

[27]  J. Butler,et al.  Forensic DNA Typing: Biology and Technology behind STR Markers , 2002, Heredity.

[28]  V. R. Chechetkin,et al.  Anticodons, Frameshifts, and Hidden Periodicities in tRNA Sequences , 2006, Journal of biomolecular structure & dynamics.

[29]  V. Chechetkin,et al.  Search of hidden periodicities in DNA sequences. , 1995, Journal of theoretical biology.

[30]  D. Relman,et al.  Microbial Forensics--"Cross-Examining Pathogens" , 2002, Science.

[31]  Finding Periodicities in DNA Sequences with a Wavelet Technique , .

[32]  A. Smit,et al.  The origin of interspersed repeats in the human genome. , 1996, Current opinion in genetics & development.

[33]  S. Tiwari,et al.  Prediction of probable genes by Fourier analysis of genomic sequences , 1997, Comput. Appl. Biosci..

[34]  P.D. Cristea,et al.  Genomic signal processing , 2004, 7th Seminar on Neural Network Applications in Electrical Engineering, 2004. NEUREL 2004. 2004.

[35]  J. Tuqan,et al.  Trigonometric transforms for finding repeats in DNA sequences , 2008, 2008 IEEE International Workshop on Genomic Signal Processing and Statistics.

[36]  Liming Wang,et al.  Mapping Equivalence for Symbolic Sequences: Theory and Applications , 2009, IEEE Transactions on Signal Processing.

[37]  Amir Asif,et al.  A fast DFT based gene prediction algorithm for identification of protein coding regions , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[38]  Jamal Tuqan,et al.  Gene Identification Using the Z-Curve Representation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[39]  Russian Federation Size-dependence of three-periodicity and long-range correlations in DNA sequences , 1995 .

[40]  Eivind Coward,et al.  Equivalence of two Fourier methods for biological sequences , 1997 .