Improved Comb Filter based Approach for Effective Prediction of Protein Coding Regions in DNA Sequences

The prediction of protein coding regions in DNA sequences is an important problem in computational biology. It is observed that nucleotides in the protein coding regions or exons of a DNA sequence show period-3 property. Hence identification of the period-3 regions helps in predicting the gene locations within the billions long DNA sequence of eukaryotic cells. The period-3 property exhibited in exons of eukaryotic gene sequences enables signal processing based time-domain and frequency domain methods to predict these regions efficiently. Several approaches based on signal processing tools have, therefore, been applied to this problem, to predict these regions effectively. This paper describes novel and efficient comb filter-based techniques for the prediction of protein coding region based on the period-3 behavior of codon sequences. The proposed method is then validated on Burset/Guigo1996, HMR195 and KEGG standard datasets using various prediction measures. It is shown that cascaded differentiator comb (CDC) filter can be used for prediction of protein coding region with better prediction efficiency, and involves less computational complexity compared with the other signal processing techniques based on period-3 property.

[1]  Yizhar Lavner,et al.  Gene prediction by spectral rotation measure: a new method for identifying protein-coding regions. , 2003, Genome research.

[2]  Juan V. Lorenzo Ginori,et al.  A New Predictor of Coding Regions in Genomic Sequences using a Combination of Different Approaches , 2007 .

[3]  Gail L. Rosen,et al.  Signal processing for biologically-inspired gradient source localization and DNA sequence analysis , 2006 .

[4]  R. Voss,et al.  Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. , 1992, Physical review letters.

[5]  Dimitris Anastassiou,et al.  Frequency-domain analysis of biomolecular sequences , 2000, Bioinform..

[6]  E. Ambikairajah,et al.  On DNA Numerical Representations for Period-3 Based Exon Prediction , 2007, 2007 IEEE International Workshop on Genomic Signal Processing and Statistics.

[7]  Alan V. Oppenheim,et al.  Discrete-time Signal Processing. Vol.2 , 2001 .

[8]  Dimitris Anasstassiou DIGITAL SIGNAL PROCESSING OF BIOMOLECULAR SEQUENCES , 2002 .

[9]  R. Guigó,et al.  Evaluation of gene structure prediction programs. , 1996, Genomics.

[10]  A. Nair,et al.  A coding measure scheme employing electron-ion interaction pseudopotential (EIIP) , 2006, Bioinformation.

[11]  Roberto Garello,et al.  The Minimum Entropy Mapping Spectrum of a DNA Sequence , 2010, IEEE Transactions on Information Theory.

[12]  Alan K. Mackworth,et al.  Evaluation of gene-finding programs on mammalian sequences. , 2001, Genome research.

[13]  P. P. Va,et al.  Digital filters for gene prediction applications , 2002 .

[14]  Jamal Tuqan,et al.  Gene Identification Using the Z-Curve Representation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[15]  Andrzej K. Brodzik,et al.  Symbol-balanced quaternionic periodicity transform for latent pattern detection in DNA sequences , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[16]  Roberto Marcondes Cesar Junior,et al.  Identification of Protein Coding Regions Using the Modified Gabor-Wavelet Transform , 2008, IEEE ACM Trans. Comput. Biol. Bioinform..

[17]  中尾 光輝,et al.  KEGG(Kyoto Encyclopedia of Genes and Genomes)〔和文〕 (特集 ゲノム医学の現在と未来--基礎と臨床) -- (データベース) , 2000 .

[18]  S S Tambe,et al.  Application of artificial neural networks for prokaryotic transcription terminator prediction , 1994, FEBS letters.

[19]  P. P. Vaidyanathan,et al.  The role of signal-processing concepts in genomics and proteomics , 2004, J. Frankl. Inst..

[20]  Dimitris Anastassiou,et al.  Genomic signal processing , 2001, IEEE Signal Process. Mag..

[21]  R Zhang,et al.  Z curves, an intutive tool for visualizing and analyzing the DNA sequences. , 1994, Journal of biomolecular structure & dynamics.

[22]  Jamal Tuqan,et al.  A DSP Approach for Finding the Codon Bias in DNA Sequences , 2008, IEEE Journal of Selected Topics in Signal Processing.

[23]  S. Tiwari,et al.  Prediction of probable genes by Fourier analysis of genomic sequences , 1997, Comput. Appl. Biosci..

[24]  Mahmood Akhtar,et al.  Signal Processing in Sequence Analysis: Advances in Eukaryotic Gene Prediction , 2008, IEEE Journal of Selected Topics in Signal Processing.

[25]  R. M. C. Junior,et al.  Identification of Protein Coding Regions Using the Modified Gabor-Wavelet Transform , 2008, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[26]  Stephen A. Dyer,et al.  Digital signal processing , 2018, 8th International Multitopic Conference, 2004. Proceedings of INMIC 2004..

[27]  R. Linsker,et al.  A measure of DNA periodicity. , 1986, Journal of theoretical biology.

[28]  R. Ramaswamy,et al.  Ab initio gene identification: Prokaryote genome annotation with GeneScan and GLIMMER , 2002, Journal of Biosciences.

[29]  Yazhu Chen,et al.  A Brief Review of Computational Gene Prediction Methods , 2004, Genomics, proteomics & bioinformatics.

[30]  James W. Fickett,et al.  The Gene Identification Problem: An Overview for Developers , 1995, Comput. Chem..

[31]  Paul Dan Cristea,et al.  Genetic signal representation and analysis , 2002, SPIE BiOS.