A multirate DSP structure for the identification of protein-coding regions

The identification of protein-coding regions in DNA sequence using digital signal processing methods is one of the central issues in bioinformatics. In this paper, a multirate structure is proposed for the identification of protein-coding regions whose input sampling rate is same as output sampling rate. The multirate structure consists of cascade combination of decimation filter, kernel filter and interpolation filter. The decimation filter is a complex filter, the kernel filter is an FIR lowpass filter and the interpolation filter isa moving average filter. Polyphase decomposition is applied on both decimation filter and interpolation filter for computationally efficient implementation. The potential of the proposed method is evaluated in comparison with existing methods using standard datasets. The results show that the proposed method improves the identification accuracy of protein-coding regions to a great extent compared to its counterparts.

[1]  Changchuan Yin,et al.  Prediction of protein coding regions by the 3-base periodicity analysis of a DNA sequence. , 2007, Journal of theoretical biology.

[2]  P. P. Vaidyanathan,et al.  The role of signal-processing concepts in genomics and proteomics , 2004, J. Frankl. Inst..

[3]  Dimitris Anastassiou,et al.  Genomic signal processing , 2001, IEEE Signal Process. Mag..

[4]  R. Voss,et al.  Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. , 1992, Physical review letters.

[5]  R. Guigó,et al.  Evaluation of gene structure prediction programs. , 1996, Genomics.

[6]  Malaya Kumar Hota,et al.  MULTISTAGE FILTERS FOR IDENTIFICATION OF EUKARYOTIC PROTEIN CODING REGIONS , 2012 .

[7]  S. Mitra,et al.  Interpolated finite impulse response filters , 1984 .

[8]  Alan K. Mackworth,et al.  Evaluation of gene-finding programs on mammalian sequences. , 2001, Genome research.

[9]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[10]  J. Fickett Recognition of protein coding regions in DNA sequences. , 1982, Nucleic acids research.

[11]  Mahmood Akhtar,et al.  Signal Processing in Sequence Analysis: Advances in Eukaryotic Gene Prediction , 2008, IEEE Journal of Selected Topics in Signal Processing.

[12]  V. Makarov Computer programs for eukaryotic gene prediction , 2002, Briefings Bioinform..

[13]  Georges Bonnerot,et al.  Digital filtering by polyphase network:Application to sample-rate alteration and filter banks , 1976 .

[14]  A. Nair,et al.  A coding measure scheme employing electron-ion interaction pseudopotential (EIIP) , 2006, Bioinformation.

[15]  Vinay Kumar Srivastava,et al.  Identification of protein coding regions using antinotch filters , 2012, Digit. Signal Process..

[16]  I-Ching Yang,et al.  Spectral classification of archaeal and bacterial genomes , 2002 .

[17]  R. M. C. Junior,et al.  Identification of Protein Coding Regions Using the Modified Gabor-Wavelet Transform , 2008, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[18]  Yizhar Lavner,et al.  Gene prediction by spectral rotation measure: a new method for identifying protein-coding regions. , 2003, Genome research.

[19]  Wentian Li,et al.  The Study of Correlation Structures of DNA Sequences: A Critical Review , 1997, Comput. Chem..

[20]  Jamal Tuqan,et al.  A DSP Approach for Finding the Codon Bias in DNA Sequences , 2008, IEEE Journal of Selected Topics in Signal Processing.

[21]  Dominique Lavenier,et al.  Coding Region Prediction Based on a Universal DNA Sequence Representation Method , 2008, J. Comput. Biol..

[22]  Hao Huang,et al.  An efficient sliding window strategy for accurate location of eukaryotic protein coding regions , 2009, Comput. Biol. Medicine.