Trigonometric transforms for finding repeats in DNA sequences

The detection of many forms of periodicities in DNA sequences has been an active area of research in recent years. Most of the signal processing based methods have primarily focussed on using the short-time discrete Fourier transform (ST-DFT) as the key tool in identifying such repeat sequences. In this paper, we propose to use different fast discrete transforms such as the discrete cosine transform (DCT), the discrete sine transform (DST), and the discrete Hartley transform (DHT), to locate these patterns. In specific, we derive a new unified multirate DSP model that i) allows the derivation of new closed form DNA spectrum expressions for the above trigonometric transforms, ii) includes the DFT model as a special case, and iii) suggests an efficient way to improve the detection of repeats by digital filtering.

[1]  V. R. Chechetkin,et al.  Anticodons, Frameshifts, and Hidden Periodicities in tRNA Sequences , 2006, Journal of biomolecular structure & dynamics.

[2]  R. Linsker,et al.  A measure of DNA periodicity. , 1986, Journal of theoretical biology.

[3]  Sanjit K. Mitra,et al.  Power spectrum analysis for DNA sequences , 2003, Seventh International Symposium on Signal Processing and Its Applications, 2003. Proceedings..

[4]  R. Voss,et al.  Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. , 1992, Physical review letters.

[5]  Suparerk Janjarasjitt,et al.  Detection and visualization of tandem repeats in DNA sequences , 2003, IEEE Trans. Signal Process..

[6]  Rappold,et al.  Human Molecular Genetics , 1996, Nature Medicine.

[7]  G. Benson,et al.  Tandem repeats finder: a program to analyze DNA sequences. , 1999, Nucleic acids research.

[8]  Ivo Grosse,et al.  Repeats and correlations in human DNA sequences. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[9]  Jamal Tuqan,et al.  Gene Identification Using the Z-Curve Representation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[10]  S. Tiwari,et al.  Prediction of probable genes by Fourier analysis of genomic sequences , 1997, Comput. Appl. Biosci..

[11]  Jamal Tuqan,et al.  A DSP Approach for Finding the Codon Bias in DNA Sequences , 2008, IEEE Journal of Selected Topics in Signal Processing.

[12]  Michael R. Hayden,et al.  Analysis of Triplet Repeat Disorders , 1998 .

[13]  Gajendra P. S. Raghava,et al.  Locating probable genes using Fourier transform approach , 2002, Bioinform..

[14]  V. Chechetkin,et al.  Size-dependence of three-periodicity and long-range correlations in DNA sequences , 1995 .

[15]  Emmanuel Bacry,et al.  What can we learn with wavelets about DNA sequences , 1998 .

[16]  J. Butler,et al.  Forensic DNA Typing: Biology and Technology behind STR Markers , 2002, Heredity.

[17]  A. Smit,et al.  The origin of interspersed repeats in the human genome. , 1996, Current opinion in genetics & development.

[18]  P.D. Cristea,et al.  Genomic signal processing , 2004, 7th Seminar on Neural Network Applications in Electrical Engineering, 2004. NEUREL 2004. 2004.