Prediction of Protein Coding Regions Using a Wide-Range Wavelet Window Method

Prediction of protein coding regions is an important topic in the field of genomic sequence analysis. Several spectrum-based techniques for the prediction of protein coding regions have been proposed. However, the outstanding issue in most of the proposed techniques is that these techniques depend on an experimentally-selected, predefined value of the window length. In this paper, we propose a new Wide-Range Wavelet Window (WRWW) method for the prediction of protein coding regions. The analysis of the proposed wavelet window shows that its frequency response can adapt its width to accommodate the change in the window length so that it can allow or prevent frequencies other than the basic frequency in the analysis of DNA sequences. This feature makes the proposed window capable of analyzing DNA sequences with a wide range of the window lengths without degradation in the performance. The experimental analysis of applying the WRWW method and other spectrum-based methods to five benchmark datasets has shown that the proposed method outperforms other methods along a wide range of the window lengths. In addition, the experimental analysis has shown that the proposed method is dominant in the prediction of both short and long exons.

[1]  Stefan C. Kremer,et al.  A dynamic representation-based, de novo method for protein-coding region prediction and biological information detection , 2015, Digit. Signal Process..

[2]  Rajiv Saxena,et al.  An Adaptive Window Length Strategy for Eukaryotic CDS Prediction , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[3]  Rajiv Saxena,et al.  Improved exon prediction with transforms by de-noising period-3 measure , 2013, Digit. Signal Process..

[4]  Stefan C. Kremer,et al.  Protein coding region prediction based on the adaptive representation method , 2011, 2011 24th Canadian Conference on Electrical and Computer Engineering(CCECE).

[5]  S. C. Kremer,et al.  Gene Prediction Based on DNA Spectral Analysis: A Literature Review , 2011, J. Comput. Biol..

[6]  Dominique Lavenier,et al.  Coding Region Prediction Based on a Universal DNA Sequence Representation Method , 2008, J. Comput. Biol..

[7]  Mahmood Akhtar,et al.  Digital Signal Processing Techniques for Gene Finding in Eukaryotes , 2008, ICISP.

[8]  Mahmood Akhtar,et al.  Signal Processing in Sequence Analysis: Advances in Eukaryotic Gene Prediction , 2008, IEEE Journal of Selected Topics in Signal Processing.

[9]  R. M. C. Junior,et al.  Identification of Protein Coding Regions Using the Modified Gabor-Wavelet Transform , 2008, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[10]  Eliathamby Ambikairajah,et al.  Boosting approach to exon detection in DNA sequences , 2008 .

[11]  E. Ambikairajah,et al.  On DNA Numerical Representations for Period-3 Based Exon Prediction , 2007, 2007 IEEE International Workshop on Genomic Signal Processing and Statistics.

[12]  Yvan Saeys,et al.  In search of the small ones: improved prediction of short exons in vertebrates, plants, fungi and protists , 2007, Bioinform..

[13]  E. Ambikairajah,et al.  A method for detecting short initial exons , 2006, 2006 IEEE International Workshop on Genomic Signal Processing and Statistics.

[14]  Hong Yan,et al.  Pattern recognition techniques for the emerging field of bioinformatics: A review , 2005, Pattern Recognit..

[15]  Amir Asif,et al.  Prediction of protein coding regions in DNA sequences using Fourier spectral characteristics , 2004, IEEE Sixth International Symposium on Multimedia Software Engineering.

[16]  Trevor W. Fox,et al.  A Digital Signal Processing Method for Gene Prediction with Improved Noise Suppression , 2004, EURASIP J. Adv. Signal Process..

[17]  W. Wayt Gibbs The unseen genome: beyond DNA. , 2003, Scientific American.

[18]  Yizhar Lavner,et al.  Gene prediction by spectral rotation measure: a new method for identifying protein-coding regions. , 2003, Genome research.

[19]  Amir Niknejad,et al.  DNA sequence representation without degeneracy. , 2003, Nucleic acids research.

[20]  P.P. Vaidyanathan,et al.  Digital filters for gene prediction applications , 2002, Conference Record of the Thirty-Sixth Asilomar Conference on Signals, Systems and Computers, 2002..

[21]  Alan K. Mackworth,et al.  Evaluation of gene-finding programs on mammalian sequences. , 2001, Genome research.

[22]  Dimitris Anastassiou,et al.  Frequency-domain analysis of biomolecular sequences , 2000, Bioinform..

[23]  M. Sansom Putting the parts together. , 1999, Current biology : CB.

[24]  Anders Krogh,et al.  Chapter 4 - An introduction to hidden Markov models for biological sequences , 1998 .

[25]  A. Krogh 11 – Gene Finding: Putting the Parts Together , 1998 .

[26]  Anders Krogh,et al.  Two Methods for Improving Performance of a HMM and their Application for Gene Finding , 1997, ISMB.

[27]  S. Tiwari,et al.  Prediction of probable genes by Fourier analysis of genomic sequences , 1997, Comput. Appl. Biosci..

[28]  Roderic Guigó,et al.  DNA Composition, Codon Usage and Exon Prediction , 1997 .

[29]  A multiresolution wavelet method for characterisation of ventricular late potentials , 1996, Computers in Cardiology 1996.

[30]  R. Guigó,et al.  Evaluation of gene structure prediction programs. , 1996, Genomics.

[31]  V. Chechetkin,et al.  Size-dependence of three-periodicity and long-range correlations in DNA sequences , 1995 .

[32]  R Zhang,et al.  Z curves, an intutive tool for visualizing and analyzing the DNA sequences. , 1994, Journal of biomolecular structure & dynamics.

[33]  R. Voss,et al.  Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. , 1992, Physical review letters.

[34]  A A Tsonis,et al.  Periodicity in DNA coding sequences: implications in gene evolution. , 1991, Journal of theoretical biology.