Identification of Protein Coding Regions in the Eukaryotic DNA Sequences Based on Marple Algorithm and Wavelet Packets Transform

The identification of protein coding regions (exons) plays a critical role in eukaryotic gene structure prediction. Many techniques have been introduced for discriminating between the exons and the introns in the eukaryotic DNA sequences, such as the discrete Fourier transform (DFT) based techniques, but these DFT-based methods rapidly lose their effectiveness in the case of short DNA sequences. In this paper, a novel integrated algorithm based on autoregressive spectrum analysis and wavelet packets transform is presented to improve the efficiency and accuracy of the coding regions identification. The experimental results show that the new algorithm outperforms the conventional DFT-based approaches in improving the prediction accuracy of protein coding regions distinctly by testing GENSCAN65, HMR195, and BG570 benchmark datasets.

[1]  Xia Wang,et al.  Fault diagnosis of diesel engine based on adaptive wavelet packets and EEMD-fractal dimension , 2013 .

[2]  Hamed Heravi,et al.  A Fast Algorithm for Exonic Regions Prediction in DNA Sequences , 2013, Journal of medical signals and sensors.

[3]  Hamed Heravi,et al.  A Novel Fast Algorithm for Exon Prediction in Eukaryotic Genes using Linear Predictive Coding Model and Goertzel Algorithm based on the Z-Curve , 2013 .

[4]  Srabanti Maji and Deepak Garg Progress in Gene Prediction: Principles and Challenges , 2013 .

[5]  Neelam Goel,et al.  A Review of Soft Computing Techniques for Gene Prediction , 2013 .

[6]  H. Saberkari,et al.  Prediction of protein coding regions in DNA sequences using signal processing methods , 2012, 2012 IEEE Symposium on Industrial Electronics and Applications.

[7]  Mohammed Abo-Zahhad,et al.  Genomic Analysis and Classification of Exon and Intron Sequences Using DNA Numerical Mapping Techniques , 2012 .

[8]  Hong Yan,et al.  Exon prediction using empirical mode decomposition and Fourier transform of structural profiles of DNA sequences , 2012, Pattern Recognit..

[9]  Hon Keung Kwan,et al.  Novel methodologies for spectral classification of exon and intron sequences , 2012, EURASIP J. Adv. Signal Process..

[10]  Omid Abbasi,et al.  RESEARCH ARTICLE Open Access Identification of exonic regions in DNA sequences , 2022 .

[11]  Chi-Man Vong,et al.  Engine ignition signal diagnosis with Wavelet Packet Transform and Multi-class Least Squares Support Vector Machines , 2011, Expert Syst. Appl..

[12]  S. C. Kremer,et al.  Gene Prediction Based on DNA Spectral Analysis: A Literature Review , 2011, J. Comput. Biol..

[13]  N. Rao,et al.  Inferring an organism-specific optimal threshold for predicting protein coding regions in eukaryotes based on a bootstrapping algorithm , 2011, Biotechnology Letters.

[14]  Hong Yan,et al.  Short Exon Detection in DNA Sequences Based on Multifeature Spectral Analysis , 2011, EURASIP J. Adv. Signal Process..

[15]  Hon Keung Kwan,et al.  Spectral analysis of numerical exon and intron sequences , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW).

[16]  Xingyi Hang,et al.  Dual Coding Genes in Eukaryote*: Dual Coding Genes in Eukaryote* , 2009 .

[17]  Nasser Ghasem-Aghaee,et al.  A novel ACO-GA hybrid algorithm for feature selection in protein function prediction , 2009, Expert Syst. Appl..

[18]  Hao Huang,et al.  An efficient sliding window strategy for accurate location of eukaryotic protein coding regions , 2009, Comput. Biol. Medicine.

[19]  Shen Zhi Dual Coding Genes in Eukaryote , 2009 .

[20]  Stphane Mallat,et al.  A Wavelet Tour of Signal Processing, Third Edition: The Sparse Way , 2008 .

[21]  Mahmood Akhtar,et al.  Signal Processing in Sequence Analysis: Advances in Eukaryotic Gene Prediction , 2008, IEEE Journal of Selected Topics in Signal Processing.

[22]  R. M. C. Junior,et al.  Identification of Protein Coding Regions Using the Modified Gabor-Wavelet Transform , 2008, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[23]  Hong Yan,et al.  Multi-scale parametric spectral analysis for exon detection in DNA sequences based on forward-backward linear prediction and singular value decomposition of the double-base curves , 2008, Bioinformation.

[24]  Stéphane Mallat,et al.  A Wavelet Tour of Signal Processing - The Sparse Way, 3rd Edition , 2008 .

[25]  E. Ambikairajah,et al.  On DNA Numerical Representations for Period-3 Based Exon Prediction , 2007, 2007 IEEE International Workshop on Genomic Signal Processing and Statistics.

[26]  Tuan D. Pham,et al.  Spectral estimation techniques for DNA sequence and microarray data analysis , 2007 .

[27]  Anton Nekrutenko,et al.  A First Look at ARFome: Dual-Coding Genes in Mammalian Genomes , 2007, PLoS Comput. Biol..

[28]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Researchers , 2007 .

[29]  A. Nair,et al.  A coding measure scheme employing electron-ion interaction pseudopotential (EIIP) , 2006, Bioinformation.

[30]  Mahmood Akhtar,et al.  Comparison of Gene and Exon Prediction Techniques for Detection of Short Coding Regions , 2006 .

[31]  E. Ambikairajah,et al.  Detection of period-3 behavior in genomic sequences using singular value decomposition , 2005, Proceedings of the IEEE Symposium on Emerging Technologies, 2005..

[32]  Yvan Saeys,et al.  Large-scale structural analysis of the core promoter in mammalian and plant genomes , 2005, Nucleic acids research.

[33]  Ren Zhang,et al.  Identification of replication origins in archaeal genomes based on the Z-curve method. , 2005, Archaea.

[34]  P.D. Cristea,et al.  Genomic signal processing , 2004, 7th Seminar on Neural Network Applications in Electrical Engineering, 2004. NEUREL 2004. 2004.

[35]  N. Rao,et al.  Detection of 3-periodicity for small genomic sequences based on AR technique , 2004, 2004 International Conference on Communications, Circuits and Systems (IEEE Cat. No.04EX914).

[36]  Leonidas D. Iasemidis,et al.  Autoregressive Modeling and Feature Analysis of DNA Sequences , 2004, EURASIP J. Adv. Signal Process..

[37]  P. Rouzé,et al.  Current methods of gene prediction, their strengths and weaknesses. , 2002, Nucleic acids research.

[38]  Paul Dan Cristea,et al.  Genetic signal representation and analysis , 2002, SPIE BiOS.

[39]  P. P. Vaidyanathan,et al.  GENE AND EXON PREDICTION USING ALLPASS-BASED FILTERS , 2002 .

[40]  Alan K. Mackworth,et al.  Evaluation of gene-finding programs on mammalian sequences. , 2001, Genome research.

[41]  Chris H. Q. Ding,et al.  Multi-class protein fold recognition using support vector machines and neural networks , 2001, Bioinform..

[42]  Paul Levi,et al.  GENIO/scan - EST Guided Identification of Genes in Human Genomic DNA , 1998, German Conference on Bioinformatics.

[43]  S. Tiwari,et al.  Prediction of probable genes by Fourier analysis of genomic sequences , 1997, Comput. Appl. Biosci..

[44]  Roderic Guigó,et al.  DNA Composition, Codon Usage and Exon Prediction , 1997 .

[45]  Steven Salzberg,et al.  Finding Genes in DNA with a Hidden Markov Model , 1997, J. Comput. Biol..

[46]  R. Guigó,et al.  Evaluation of gene structure prediction programs. , 1996, Genomics.

[47]  Charles L. Lawson,et al.  Solving least squares problems , 1976, Classics in applied mathematics.

[48]  C. A. Chatzidimitriou-Dreismann,et al.  Long-range correlations in DNA , 1993, Nature.

[49]  R. Voss,et al.  Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. , 1992, Physical review letters.

[50]  A A Tsonis,et al.  Periodicity in DNA coding sequences: implications in gene evolution. , 1991, Journal of theoretical biology.

[51]  Steven Kay,et al.  Modern Spectral Estimation: Theory and Application , 1988 .

[52]  R. Linsker,et al.  A measure of DNA periodicity. , 1986, Journal of theoretical biology.

[53]  J. Fickett Recognition of protein coding regions in DNA sequences. , 1982, Nucleic acids research.

[54]  S. Lang,et al.  Frequency estimation with maximum entropy spectral estimators , 1980 .

[55]  H. Akaike A new look at the statistical model identification , 1974 .

[56]  H. Akaike Fitting autoregressive models for prediction , 1969 .