A Modified Statistically Optimal Null Filter Method for Recognizing Protein-coding Regions

Computer-aided protein-coding gene prediction in uncharacterized genomic DNA sequences is one of the most important issues of biological signal processing. A modified filter method based on a statistically optimal null filter (SONF) theory is proposed for recognizing protein-coding regions. The square deviation gain (SDG) between the input and output of the model is used to identify the coding regions. The effective SDG amplification model with Class I and Class II enhancement is designed to suppress the non-coding regions. Also, an evaluation algorithm has been used to compare the modified model with most gene prediction methods currently available in terms of sensitivity, specificity and precision. The performance for identification of protein-coding regions has been evaluated at the nucleotide level using benchmark datasets and 91.4%, 96%, 93.7% were obtained for sensitivity, specificity and precision, respectively. These results suggest that the proposed model is potentially useful in gene finding field, which can help recognize protein-coding regions with higher precision and speed than present algorithms.

[1]  V. K. Srivastava,et al.  DSP technique for gene and exon prediction taking complex indicator sequence , 2008, TENCON 2008 - 2008 IEEE Region 10 Conference.

[2]  A L Goldberger,et al.  Correlation approach to identify coding regions in DNA sequences. , 1994, Biophysical journal.

[3]  A. Antoniou,et al.  Application of parametric window functions to the STDFT method for gene prediction , 2005, PACRIM. 2005 IEEE Pacific Rim Conference on Communications, Computers and signal Processing, 2005..

[4]  Ian Korf,et al.  Gene finding in novel genomes , 2004, BMC Bioinformatics.

[5]  Changchuan Yin,et al.  Prediction of protein coding regions by the 3-base periodicity analysis of a DNA sequence. , 2007, Journal of theoretical biology.

[6]  D.G. Grandhi,et al.  2-Simplex mapping for identifying the protein coding regions in DNA , 2007, TENCON 2007 - 2007 IEEE Region 10 Conference.

[7]  R Zhang,et al.  Z curves, an intutive tool for visualizing and analyzing the DNA sequences. , 1994, Journal of biomolecular structure & dynamics.

[8]  M. Swamy,et al.  Statistically optimal null filter based on instantaneous matched processing , 2001 .

[9]  David Haussler,et al.  A Generalized Hidden Markov Model for the Recognition of Human Genes in DNA , 1996, ISMB.

[10]  Jamal Tuqan,et al.  A DSP Approach for Finding the Codon Bias in DNA Sequences , 2008, IEEE Journal of Selected Topics in Signal Processing.

[11]  P.P. Vaidyanathan,et al.  Digital filters for gene prediction applications , 2002, Conference Record of the Thirty-Sixth Asilomar Conference on Signals, Systems and Computers, 2002..

[12]  Trevor W. Fox,et al.  A Digital Signal Processing Method for Gene Prediction with Improved Noise Suppression , 2004, EURASIP J. Adv. Signal Process..

[13]  George L. Turin,et al.  An introduction to digital matched filters , 1976 .

[14]  P.D. Cristea,et al.  Genomic signal processing , 2004, 7th Seminar on Neural Network Applications in Electrical Engineering, 2004. NEUREL 2004. 2004.

[15]  Feng Gao,et al.  Comparison of various algorithms for recognizing short coding sequences of human genes , 2004, Bioinform..

[16]  Dimitris Anastassiou,et al.  Frequency-domain analysis of biomolecular sequences , 2000, Bioinform..

[17]  Steven Salzberg,et al.  Finding Genes in DNA with a Hidden Markov Model , 1997, J. Comput. Biol..

[18]  Alan K. Mackworth,et al.  Evaluation of gene-finding programs on mammalian sequences. , 2001, Genome research.

[19]  Roderic Guigó,et al.  DNA Composition, Codon Usage and Exon Prediction , 1997 .

[20]  A. Lapedes,et al.  Determination of eukaryotic protein coding regions using neural networks and information theory. , 1992, Journal of molecular biology.

[21]  G.L. Turin,et al.  An introduction to digitial matched filters , 1976, Proceedings of the IEEE.

[22]  M. Omair Ahmad,et al.  Prediction of protein-coding regions in DNA sequences using a model-based approach , 2008, 2008 IEEE International Symposium on Circuits and Systems.

[23]  R. M. C. Junior,et al.  Identification of Protein Coding Regions Using the Modified Gabor-Wavelet Transform , 2008, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[24]  R. Guigó,et al.  Evaluation of gene structure prediction programs. , 1996, Genomics.

[25]  Victor V. Solovyev,et al.  The Gene-Finder Computer Tools for Analysis of Human and Model Organisms Genome Sequences , 1997, ISMB.

[26]  Xiao Liu,et al.  A Novel Representation Approach to DNA Sequence and Its Application , 2009, IEEE Signal Processing Letters.

[27]  Rajeev Yadav,et al.  A new improved model-based seizure detection using statistically optimal null filter , 2009, 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[28]  Tibério S. Caetano,et al.  Active subnetwork recovery with a mechanism-dependent scoring function; with application to angiogenesis and organogenesis studies , 2013, BMC Bioinformatics.

[29]  Ganapati Panda,et al.  Identification of Protein-Coding Regions in DNA Sequences Using A Time-Frequency Filtering Approach , 2011, Genom. Proteom. Bioinform..

[30]  Amir Asif,et al.  A fast DFT based gene prediction algorithm for identification of protein coding regions , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[31]  M. Yan,et al.  A new fourier transform approach for protein coding measure based on the format of the Z curve , 1998, Bioinform..

[32]  E. Ambikairajah,et al.  A signal boosting technique for gene prediction , 2007, 2007 6th International Conference on Information, Communications & Signal Processing.

[33]  Andreas Antoniou,et al.  Location of exons in DNA sequences using digital filters , 2009, 2009 IEEE International Symposium on Circuits and Systems.