Digital signal processing for gene prediction

Identification of gene locations in a DNA sequence is one of the important problems in the area of genomics. Nucleotides in exons of a DNA sequence show f = 1/3 periodicity. The period-3 property in exons of eukaryotic gene sequences enables signal processing based time-domain and frequency-domain methods to predict these regions. Identification of the period-3 regions helps in predicting the gene locations within the billions long DNA sequence of eukaryotic cells. Existing non-parametric filtering techniques are less effective in detecting small exons. This paper presents a harmonic suppression filter and parametric minimum variance spectrum estimation technique for gene prediction. We show that both the filtering techniques are able to detect smaller exon regions and adaptive MV filter minimizes the power in introns (non-coding regions) giving more suppression to the intron regions. Furthermore, 2-simplex mapping is used to reduce the computational complexity.

[1]  P. P. Vaidyanathan,et al.  The role of signal-processing concepts in genomics and proteomics , 2004, J. Frankl. Inst..

[2]  Dimitris Anastassiou,et al.  Genomic signal processing , 2001, IEEE Signal Process. Mag..

[3]  Yizhar Lavner,et al.  Gene prediction by spectral rotation measure: a new method for identifying protein-coding regions. , 2003, Genome research.

[4]  Dan Gusfield Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[5]  Mahmood Akhtar,et al.  Gene and exon prediction using time domain algorithms , 2005, Proceedings of the Eighth International Symposium on Signal Processing and Its Applications, 2005..

[6]  Maxime Crochemore,et al.  Algorithms on strings , 2007 .

[7]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[8]  Qiu Lijun,et al.  [Study of numerical mapping methods for DNA sequences]. , 2005, Sheng wu yi xue gong cheng xue za zhi = Journal of biomedical engineering = Shengwu yixue gongchengxue zazhi.

[9]  Brian Kinghorn,et al.  Periodicity of DNA in exons , 2004, BMC Molecular Biology.

[10]  Dimitris Anastassiou,et al.  Frequency-domain analysis of biomolecular sequences , 2000, Bioinform..

[11]  Roderic Guigó,et al.  DNA Composition, Codon Usage and Exon Prediction , 1997 .

[12]  R. Linsker,et al.  A measure of DNA periodicity. , 1986, Journal of theoretical biology.

[13]  D.G. Grandhi,et al.  2-Simplex mapping for identifying the protein coding regions in DNA , 2007, TENCON 2007 - 2007 IEEE Region 10 Conference.

[14]  Monson H. Hayes,et al.  Statistical Digital Signal Processing and Modeling , 1996 .

[15]  P. P. Va,et al.  Digital filters for gene prediction applications , 2002 .

[16]  J. Capon High-resolution frequency-wavenumber spectrum analysis , 1969 .

[17]  A. Krensky THE NATIONAL INSTITUTE OF HEALTH , .

[18]  S. Tiwari,et al.  Prediction of probable genes by Fourier analysis of genomic sequences , 1997, Comput. Appl. Biosci..

[19]  J. Fickett Recognition of protein coding regions in DNA sequences. , 1982, Nucleic acids research.

[20]  N. Rao,et al.  Detection of 3-periodicity for small genomic sequences based on AR technique , 2004, 2004 International Conference on Communications, Circuits and Systems (IEEE Cat. No.04EX914).

[21]  P. Vaidyanathan Genomics and proteomics: a signal processor's tour , 2004, IEEE Circuits and Systems Magazine.

[22]  R. Voss,et al.  Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. , 1992, Physical review letters.