A new algorithm for predicting protein coding regions based on the hybird threshold

In protein coding regions prediction works, the predicted non-coding percentile (PNCP) is usually used as the threshold to separate the DNA sequences into two kinds of regions, i.e. protein coding regions and non-coding regions. In this paper, a new protein coding regions prediction algorithm based on the hybrid threshold is presented. First, the normalized power spectral density (PSD) of a DNA sequence is calculated using the prediction algorithm based on narrow pass-band filters (NPBF), and the maximum PSD value of the DNA sequence is used as the normalization standard in the NPBF algorithm. Second, the coding regions' PSD curve is set up as a trapezoid model characterized by its slope and height. Third, the method for calculating the hybrid threshold which taking the slope and the height both into consideration is presented. Finally, the algorithm is performed on DNA dataset HMR195. Using the approximate correlation (AC) as the evaluation measure of prediction accuracy, the prediction results of the proposed algorithm reach to 0.48 for the dataset, which is much better than the modified Gabor wavelet transform algorithm. The prediction results are also presented in the form of q9 proposed by Chun-Ting Zhang in 2002.

[1]  R. Voss,et al.  Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. , 1992, Physical review letters.

[2]  R. Guigó,et al.  Evaluation of gene structure prediction programs. , 1996, Genomics.

[3]  S. Tiwari,et al.  Prediction of probable genes by Fourier analysis of genomic sequences , 1997, Comput. Appl. Biosci..

[4]  Roderic Guigó,et al.  DNA Composition, Codon Usage and Exon Prediction , 1997 .

[5]  Alan K. Mackworth,et al.  Evaluation of gene-finding programs on mammalian sequences. , 2001, Genome research.

[6]  Ren Zhang,et al.  Evaluation of Gene-Finding Algorithms by a Content- Balancing Accuracy Index , 2002, Journal of biomolecular structure & dynamics.

[7]  P. P. Vaidyanathan,et al.  GENE AND EXON PREDICTION USING ALLPASS-BASED FILTERS , 2002 .

[8]  P.P. Vaidyanathan,et al.  Digital filters for gene prediction applications , 2002, Conference Record of the Thirty-Sixth Asilomar Conference on Signals, Systems and Computers, 2002..

[9]  Ren Zhang,et al.  An isochore map of the human genome based on the Z curve method. , 2003, Gene.

[10]  Ren Zhang,et al.  A systematic method to identify genomic islands and its applications in analyzing the genomes of Corynebacterium glutamicum and Vibrio vulnificus CMCP6 chromosome I , 2004, Bioinform..

[11]  N. Rao,et al.  Detection of 3-periodicity for small genomic sequences based on AR technique , 2004, 2004 International Conference on Communications, Circuits and Systems (IEEE Cat. No.04EX914).

[12]  Feng Gao,et al.  GC-Profile: a web-based tool for visualizing and analyzing the variation of GC content in genomic sequences , 2006, Nucleic Acids Res..

[13]  Zhu Yi-sheng FIR Digital Filter for Gene Prediction Application , 2007 .

[14]  C. Vijaykumar,et al.  Digital signal processing for gene prediction , 2008, TENCON 2008 - 2008 IEEE Region 10 Conference.

[15]  R. M. C. Junior,et al.  Identification of Protein Coding Regions Using the Modified Gabor-Wavelet Transform , 2008, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[16]  Baoshan Ma,et al.  A novel adaptive filtering approach for genomic signal processing , 2010, IEEE 10th INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS.

[17]  J. Piskorowski Digital $Q$-Varying Notch IIR Filter With Transient Suppression , 2010, IEEE Transactions on Instrumentation and Measurement.

[18]  Ganapati Panda,et al.  Identification of Protein-Coding Regions in DNA Sequences Using A Time-Frequency Filtering Approach , 2011, Genom. Proteom. Bioinform..

[19]  Bo Chen,et al.  Visualization of the protein-coding regions with a self adaptive spectral rotation approach , 2011, Nucleic Acids Res..

[20]  Pramod Kumar Meher,et al.  Improved Comb Filter based Approach for Effective Prediction of Protein Coding Regions in DNA Sequences , 2011, J. Signal Inf. Process..