Improved exon prediction with transforms by de-noising period-3 measure

Gene finding techniques in eukaryotic cells can be divided into two categories, viz. - model-dependent and model-independent. In model-independent category, transforms are commonly used to identify exons or genes present in DNA sequences. In this work, a Post-Processing Algorithm (PPA) for enhancing gene prediction features of transforms is developed. PPA compares the N/3 spectral components of DNA signal with the corresponding spectrum of period-3 suppressed DNA signal. In the N/3 spectrum of DNA sequences, the bases for which the difference between these two spectrums is within a predefined threshold level are marked as non-coding (introns) regions. In such regions the signal values are replaced by the difference signal of the two spectrums. This substitution suppresses the noise in the intronic regions of the N/3 spectrum; while the coding region (exonic) signals are not affected, resulting in de-noised period-3 measures. PPA has been applied to process the period-3 coefficients of Discrete Fourier Transform (DFT), Paired Spectral Content (PSC), and Modified Gabor Wavelet Transform (MGWT) methods to de-noise their period-3 measures. Performance of the algorithm has been evaluated on HMR195, Burset/Guigo570, and Asp67 datasets using Receiver Operating Characteristic (ROC) and specificity versus sensitivity curves. The PPA, while preserving the model-independent characteristic of transform based methods, improves the probability of correct prediction of the exonic regions.

[1]  S. C. Kremer,et al.  Gene Prediction Based on DNA Spectral Analysis: A Literature Review , 2011, J. Comput. Biol..

[2]  S. Tiwari,et al.  Prediction of probable genes by Fourier analysis of genomic sequences , 1997, Comput. Appl. Biosci..

[3]  S. Rapuano,et al.  An introduction to FFT and time domain windows , 2007, IEEE Instrumentation & Measurement Magazine.

[4]  D. K. Shakya,et al.  A DSP-Based Approach for Gene Prediction in Eukaryotic Genes , 2011 .

[5]  Mahmood Akhtar,et al.  Signal Processing in Sequence Analysis: Advances in Eukaryotic Gene Prediction , 2008, IEEE Journal of Selected Topics in Signal Processing.

[6]  E. Ambikairajah,et al.  Paired Spectral Content Measure for Gene and Exon Prediction in Eukaryotes , 2007, 2007 International Conference on Information and Emerging Technologies.

[7]  A D Baxevanis,et al.  Predictive methods using DNA sequences. , 2001, Methods of biochemical analysis.

[8]  R. M. C. Junior,et al.  Identification of Protein Coding Regions Using the Modified Gabor-Wavelet Transform , 2008, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[9]  Roderic Guigó,et al.  DNA Composition, Codon Usage and Exon Prediction , 1997 .

[10]  Trevor W. Fox,et al.  A Digital Signal Processing Method for Gene Prediction with Improved Noise Suppression , 2004, EURASIP J. Adv. Signal Process..

[11]  Vinay Kumar Srivastava,et al.  Identification of protein coding regions using antinotch filters , 2012, Digit. Signal Process..

[12]  Eliathamby Ambikairajah,et al.  Boosting approach to exon detection in DNA sequences , 2008 .

[13]  Jamal Tuqan,et al.  A DSP Approach for Finding the Codon Bias in DNA Sequences , 2008, IEEE Journal of Selected Topics in Signal Processing.

[14]  Alan K. Mackworth,et al.  Evaluation of gene-finding programs on mammalian sequences. , 2001, Genome research.

[15]  Yizhar Lavner,et al.  Gene prediction by spectral rotation measure: a new method for identifying protein-coding regions. , 2003, Genome research.

[16]  Rajiv Saxena,et al.  A simple algorithm for gene prediction with improved noise suppression , 2010, IEEE 10th INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS.

[17]  Mahmood Akhtar,et al.  Optimizing period-3 methods for eukaryotic gene prediction , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[18]  P. P. Vaidyanathan Genomics and Proteomics: A Signal Processor's Tour , 2004 .

[19]  Hong Yan,et al.  Exon prediction using empirical mode decomposition and Fourier transform of structural profiles of DNA sequences , 2012, Pattern Recognit..

[20]  P. P. Vaidyanathan,et al.  The role of signal-processing concepts in genomics and proteomics , 2004, J. Frankl. Inst..

[21]  Dimitris Anastassiou,et al.  Genomic signal processing , 2001, IEEE Signal Process. Mag..

[22]  R. Guigó,et al.  Evaluation of gene structure prediction programs. , 1996, Genomics.

[23]  M. N. Shanmukha Swamy,et al.  Analysis of Genomics and Proteomics Using DSP Techniques , 2008, IEEE Transactions on Circuits and Systems I: Regular Papers.

[24]  A. Antoniou,et al.  Application of parametric window functions to the STDFT method for gene prediction , 2005, PACRIM. 2005 IEEE Pacific Rim Conference on Communications, Computers and signal Processing, 2005..

[25]  Hao Huang,et al.  An efficient sliding window strategy for accurate location of eukaryotic protein coding regions , 2009, Comput. Biol. Medicine.