Improvement in protein-coding region identification based on sliding window trigonometric fast transforms using Singular Value Decomposition

In this paper, the performance of various sliding window trigonometric fast transforms for identification of protein coding regions has been analysed at the nucleotide level. It is found that, Short-Time Discrete Fourier Transform (ST-DFT) gives better identification accuracy in comparison with Short-Time Discrete Cosine Transform (ST-DCT), Short-Time Discrete Sine Transform (ST-DST) and Short-Time Discrete Hartley Transform (ST-DHT). In the proposed method, identification accuracy of protein coding regions has been improved by applying Singular Value Decomposition (SVD) on the DNA spectrum obtained using sliding window trigonometric fast transforms. The results show that, in proposed method all trigonometric fast transforms gives almost similar results in terms of area under ROC curve for GENSCAN test set.

[1]  P. P. Vaidyanathan,et al.  GENE AND EXON PREDICTION USING ALLPASS-BASED FILTERS , 2002 .

[2]  Paul Levi,et al.  GENIO/scan - EST Guided Identification of Genes in Human Genomic DNA , 1998, German Conference on Bioinformatics.

[3]  Trevor W. Fox,et al.  A Digital Signal Processing Method for Gene Prediction with Improved Noise Suppression , 2004, EURASIP J. Adv. Signal Process..

[4]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[5]  P. P. Va,et al.  Digital filters for gene prediction applications , 2002 .

[6]  Dimitris Anastassiou,et al.  Frequency-domain analysis of biomolecular sequences , 2000, Bioinform..

[7]  David Haussler,et al.  A Generalized Hidden Markov Model for the Recognition of Human Genes in DNA , 1996, ISMB.

[8]  M. N. Shanmukha Swamy,et al.  Analysis of Genomics and Proteomics Using DSP Techniques , 2008, IEEE Transactions on Circuits and Systems I: Regular Papers.

[9]  J. Tuqan,et al.  Trigonometric transforms for finding repeats in DNA sequences , 2008, 2008 IEEE International Workshop on Genomic Signal Processing and Statistics.

[10]  A. Lapedes,et al.  Determination of eukaryotic protein coding regions using neural networks and information theory. , 1992, Journal of molecular biology.

[11]  Yizhar Lavner,et al.  Gene prediction by spectral rotation measure: a new method for identifying protein-coding regions. , 2003, Genome research.

[12]  Hong Yan,et al.  Segmentation of short human exons based on spectral features of double curves , 2008, Int. J. Data Min. Bioinform..

[13]  Hong Yan,et al.  Pattern recognition techniques for the emerging field of bioinformatics: A review , 2005, Pattern Recognit..

[14]  P. P. Vaidyanathan,et al.  The role of signal-processing concepts in genomics and proteomics , 2004, J. Frankl. Inst..

[15]  Dimitris Anastassiou,et al.  Genomic signal processing , 2001, IEEE Signal Process. Mag..

[16]  R. M. C. Junior,et al.  Identification of Protein Coding Regions Using the Modified Gabor-Wavelet Transform , 2008, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[17]  Wentian Li,et al.  Universal 1/f noise, crossovers of scaling exponents, and chromosome-specific patterns of guanine-cytosine content in DNA sequences of the human genome. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[18]  Jamal Tuqan,et al.  A DSP Approach for Finding the Codon Bias in DNA Sequences , 2008, IEEE Journal of Selected Topics in Signal Processing.

[19]  Hélène Touzet,et al.  Computational Identification of Protein-Coding Sequences by Comparative Analysis , 2007, 2007 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2007).

[20]  A L Goldberger,et al.  Correlation approach to identify coding regions in DNA sequences. , 1994, Biophysical journal.

[21]  R. Voss,et al.  Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. , 1992, Physical review letters.

[22]  R. Guigó,et al.  Evaluation of gene structure prediction programs. , 1996, Genomics.

[23]  S. Tiwari,et al.  Prediction of probable genes by Fourier analysis of genomic sequences , 1997, Comput. Appl. Biosci..

[24]  J. Fickett Recognition of protein coding regions in DNA sequences. , 1982, Nucleic acids research.