Identification of Protein-Coding Regions in DNA Sequences Using A Time-Frequency Filtering Approach

Accurate identification of protein-coding regions (exons) in DNA sequences has been a challenging task in bioinformatics. Particularly the coding regions have a 3-base periodicity, which forms the basis of all exon identification methods. Many signal processing tools and techniques have been applied successfully for the identification task but still improvement in this direction is needed. In this paper, we have introduced a new promising model-independent time-frequency filtering technique based on S-transform for accurate identification of the coding regions. The S-transform is a powerful linear time-frequency representation useful for filtering in time-frequency domain. The potential of the proposed technique has been assessed through simulation study and the results obtained have been compared with the existing methods using standard datasets. The comparative study demonstrates that the proposed method outperforms its counterparts in identifying the coding regions.

[1]  P. P. Vaidyanathan,et al.  The role of signal-processing concepts in genomics and proteomics , 2004, J. Frankl. Inst..

[2]  R. Linsker,et al.  A measure of DNA periodicity. , 1986, Journal of theoretical biology.

[3]  Leonidas D. Iasemidis,et al.  Autoregressive Modeling and Feature Analysis of DNA Sequences , 2004, EURASIP J. Adv. Signal Process..

[4]  Trevor W. Fox,et al.  A Digital Signal Processing Method for Gene Prediction with Improved Noise Suppression , 2004, EURASIP J. Adv. Signal Process..

[5]  S. Tiwari,et al.  Prediction of probable genes by Fourier analysis of genomic sequences , 1997, Comput. Appl. Biosci..

[6]  P. P. Vaidyanathan,et al.  GENE AND EXON PREDICTION USING ALLPASS-BASED FILTERS , 2002 .

[7]  C. Robert Pinnegar,et al.  Time-frequency and time-time filtering with the S-transform and TT-transform , 2005, Digit. Signal Process..

[8]  Jin Jiang,et al.  Time-frequency feature representation using energy concentration: An overview of recent advances , 2009, Digit. Signal Process..

[9]  Ingrid Daubechies,et al.  The wavelet transform, time-frequency localization and signal analysis , 1990, IEEE Trans. Inf. Theory.

[10]  A. Nair,et al.  A coding measure scheme employing electron-ion interaction pseudopotential (EIIP) , 2006, Bioinformation.

[11]  S. Qian,et al.  Joint time-frequency analysis , 1999, IEEE Signal Process. Mag..

[12]  H E Stanley,et al.  Finding borders between coding and noncoding DNA regions by an entropic segmentation method. , 2000, Physical review letters.

[13]  P.D. Cristea,et al.  Genomic signal processing , 2004, 7th Seminar on Neural Network Applications in Electrical Engineering, 2004. NEUREL 2004. 2004.

[14]  James W. Fickett,et al.  The Gene Identification Problem: An Overview for Developers , 1995, Comput. Chem..

[15]  Chris H. Q. Ding,et al.  Multi-class protein fold recognition using support vector machines and neural networks , 2001, Bioinform..

[16]  R. Voss,et al.  Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. , 1992, Physical review letters.

[17]  P Kiran Sree Protein coding region identification , 2013 .

[18]  J. Fickett,et al.  Assessment of protein coding measures. , 1992, Nucleic acids research.

[19]  R Zhang,et al.  Z curves, an intutive tool for visualizing and analyzing the DNA sequences. , 1994, Journal of biomolecular structure & dynamics.

[20]  L.J. Stankovic,et al.  Time-frequency signal processing approaches with applications to heart sound analysis , 2006, 2006 Computers in Cardiology.

[21]  M. N. Shanmukha Swamy,et al.  Analysis of Genomics and Proteomics Using DSP Techniques , 2008, IEEE Transactions on Circuits and Systems I: Regular Papers.

[22]  Amir Asif,et al.  A fast DFT based gene prediction algorithm for identification of protein coding regions , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[23]  A A Tsonis,et al.  Periodicity in DNA coding sequences: implications in gene evolution. , 1991, Journal of theoretical biology.

[24]  S. O. Aase,et al.  Eukaryotic Gene Prediction by Spectral Analysis and Pattern Recognition Techniques , 2006, Proceedings of the 7th Nordic Signal Processing Symposium - NORSIG 2006.

[25]  Steven Salzberg,et al.  Finding Genes in DNA with a Hidden Markov Model , 1997, J. Comput. Biol..

[26]  Alan K. Mackworth,et al.  Evaluation of gene-finding programs on mammalian sequences. , 2001, Genome research.

[27]  E. Snyder,et al.  Identification of coding regions in genomic DNA sequences: an application of dynamic programming and neural networks. , 1993, Nucleic acids research.

[28]  Mahmood Akhtar,et al.  Comparison of Gene and Exon Prediction Techniques for Detection of Short Coding Regions , 2006 .

[29]  J. Tuqan,et al.  A DSP perspective to the period-3 detection problem , 2006, 2006 IEEE International Workshop on Genomic Signal Processing and Statistics.

[30]  I. Cosic Macromolecular bioactivity: is it resonant interaction between macromolecules?-theory and applications , 1994, IEEE Transactions on Biomedical Engineering.

[31]  J L Oliver,et al.  On the origin of the periodicity of three in protein coding DNA sequences. , 1994, Journal of theoretical biology.

[32]  C. A. Chatzidimitriou-Dreismann,et al.  Long-range correlations in DNA , 1993, Nature.

[33]  P.P. Vaidyanathan,et al.  Digital filters for gene prediction applications , 2002, Conference Record of the Thirty-Sixth Asilomar Conference on Signals, Systems and Computers, 2002..

[34]  C. Zhang,et al.  Recognition of protein coding genes in the yeast genome at better than 95% accuracy based on the Z curve. , 2000, Nucleic acids research.

[35]  Lalu Mansinha,et al.  Localization of the complex spectrum: the S transform , 1996, IEEE Trans. Signal Process..

[36]  C. Robert Pinnegar,et al.  The S-transform with windows of arbitrary and varying shape , 2003 .