SAVMD: An adaptive signal processing method for identifying protein coding regions

Abstract The identification of protein coding regions is a major topic of research in the field of gene prediction. A number of digital signal processing (DSP) based approaches, which exploit 3-base periodicity to detect coding regions, have been proposed. According to these previously published approaches, we summarize that an effective method or filter for identifying protein coding regions should fulfill three important properties, including the independence of the window length, an effective and adaptive frequency response, a fixed basic frequency of 1 ∕ 3 f . However, most of published approaches cannot simultaneously satisfy these three points, which causes that their identification accuracy is still limited. In this paper, we propose an adaptive signal processing method, called sinusoidal-assisted variational mode decomposition (SAVMD) for identifying coding regions. The adaptability of SAVMD reflects in two aspects including: (i) The proposed method analyzes numerical sequences without needing any window information; (ii) The spectrum of period-3 component can be automatically fitted by SAVMD in Fourier domain. From this, our proposed method outperforms other DSP-based methods in terms of identification accuracy, which is verified by the experimental results on five benchmark datasets. When processing the dataset where most sequences contain undetermined nucleotides (UDT), SAVMD shows more superior performance than the model-dependent method AUGUSTUS as well as other model-independent methods. In addition, we conduct a comparative analysis on different numerical conversions of DNA sequences using SAVMD. Several applicable methods for SAVMD, which are selected from this experimentation, can provide a reference to the applications of other time–frequency decomposition methods in the field of gene prediction.

[1]  S. C. Kremer,et al.  Gene Prediction Based on DNA Spectral Analysis: A Literature Review , 2011, J. Comput. Biol..

[2]  S. Tiwari,et al.  Prediction of probable genes by Fourier analysis of genomic sequences , 1997, Comput. Appl. Biosci..

[3]  P. P. Vaidyanathan,et al.  The role of signal-processing concepts in genomics and proteomics , 2004, J. Frankl. Inst..

[4]  Alan K. Mackworth,et al.  Evaluation of gene-finding programs on mammalian sequences. , 2001, Genome research.

[5]  Lillie Dewan,et al.  Comparison of Numerical Representations of Genomic Sequences: Choosing the Best Mapping for Wavelet Analysis , 2017 .

[6]  I. Cosic Macromolecular bioactivity: is it resonant interaction between macromolecules?-theory and applications , 1994, IEEE Transactions on Biomedical Engineering.

[7]  R Zhang,et al.  Z curves, an intutive tool for visualizing and analyzing the DNA sequences. , 1994, Journal of biomolecular structure & dynamics.

[8]  C. Burge,et al.  Computational inference of homologous gene structures in the human genome. , 2001, Genome research.

[9]  Stefan C. Kremer,et al.  Prediction of Protein Coding Regions Using a Wide-Range Wavelet Window Method , 2016, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[10]  Dimitris Anastassiou,et al.  Genomic signal processing , 2001, IEEE Signal Process. Mag..

[11]  Vinay Kumar Srivastava,et al.  Identification of protein coding regions using antinotch filters , 2012, Digit. Signal Process..

[12]  Stefan C. Kremer,et al.  Protein coding region prediction based on the adaptive representation method , 2011, 2011 24th Canadian Conference on Electrical and Computer Engineering(CCECE).

[13]  E. Dougherty,et al.  Genomic Signal Processing and Statistics , 2005 .

[14]  P. P. Vaidyanathan Genomics and Proteomics: A Signal Processor's Tour , 2004 .

[15]  Mahmood Akhtar,et al.  Optimizing period-3 methods for eukaryotic gene prediction , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[16]  Roderic Guigó,et al.  DNA Composition, Codon Usage and Exon Prediction , 1997 .

[17]  R. M. C. Junior,et al.  Identification of Protein Coding Regions Using the Modified Gabor-Wavelet Transform , 2008, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[18]  Yizhar Lavner,et al.  Gene prediction by spectral rotation measure: a new method for identifying protein-coding regions. , 2003, Genome research.

[19]  Mario Stanke,et al.  Gene prediction with a hidden Markov model and a new intron submodel , 2003, ECCB.

[20]  P.P. Vaidyanathan,et al.  Digital filters for gene prediction applications , 2002, Conference Record of the Thirty-Sixth Asilomar Conference on Signals, Systems and Computers, 2002..

[21]  Dimitris Anastassiou,et al.  Frequency-domain analysis of biomolecular sequences , 2000, Bioinform..

[22]  P Bernaola-Galván,et al.  Study of statistical correlations in DNA sequences. , 2002, Gene.

[23]  Norden E. Huang,et al.  Ensemble Empirical Mode Decomposition: a Noise-Assisted Data Analysis Method , 2009, Adv. Data Sci. Adapt. Anal..

[24]  Yanxue Wang,et al.  Filter bank property of variational mode decomposition and its applications , 2016, Signal Process..

[25]  Amir Asif,et al.  A fast DFT based gene prediction algorithm for identification of protein coding regions , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[26]  Omid Abbasi,et al.  RESEARCH ARTICLE Open Access Identification of exonic regions in DNA sequences , 2022 .

[27]  R. Guigó,et al.  Evaluation of gene structure prediction programs. , 1996, Genomics.

[28]  Michael Q. Zhang Computational prediction of eukaryotic protein-coding genes , 2002, Nature Reviews Genetics.

[29]  A A Tsonis,et al.  Periodicity in DNA coding sequences: implications in gene evolution. , 1991, Journal of theoretical biology.

[30]  Dominique Zosso,et al.  Variational Mode Decomposition , 2014, IEEE Transactions on Signal Processing.

[31]  Yi Liu,et al.  Hilbert-Huang Transform and the Application , 2020, 2020 IEEE International Conference on Artificial Intelligence and Information Systems (ICAIIS).

[32]  David Haussler,et al.  A Generalized Hidden Markov Model for the Recognition of Human Genes in DNA , 1996, ISMB.

[33]  R. Guigó,et al.  An assessment of gene prediction accuracy in large DNA sequences. , 2000, Genome research.

[34]  Rajiv Saxena,et al.  An Adaptive Window Length Strategy for Eukaryotic CDS Prediction , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[35]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[36]  J. Thompson,et al.  A benchmark study of ab initio gene prediction methods in diverse eukaryotic organisms , 2020, BMC Genomics.

[37]  Hong Yan,et al.  Exon prediction using empirical mode decomposition and Fourier transform of structural profiles of DNA sequences , 2012, Pattern Recognit..

[38]  Anders Krogh,et al.  Two Methods for Improving Performance of a HMM and their Application for Gene Finding , 1997, ISMB.

[39]  Mahmood Akhtar,et al.  Digital Signal Processing Techniques for Gene Finding in Eukaryotes , 2008, ICISP.

[40]  Danilo P. Mandic,et al.  Emd via mEMD: multivariate noise-Aided Computation of Standard EMD , 2013, Adv. Data Sci. Adapt. Anal..

[41]  Stefan C. Kremer,et al.  A dynamic representation-based, de novo method for protein-coding region prediction and biological information detection , 2015, Digit. Signal Process..

[42]  Luciano da Fontoura Costa,et al.  Shape Analysis and Classification: Theory and Practice , 2000 .

[43]  Hong Yan,et al.  Multi-scale parametric spectral analysis for exon detection in DNA sequences based on forward-backward linear prediction and singular value decomposition of the double-base curves , 2008, Bioinformation.

[44]  R. Voss,et al.  Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. , 1992, Physical review letters.

[45]  P. P. Vaidyanathan,et al.  GENE AND EXON PREDICTION USING ALLPASS-BASED FILTERS , 2002 .

[46]  Jérôme Gilles,et al.  Empirical Wavelet Transform , 2013, IEEE Transactions on Signal Processing.

[47]  Mahmood Akhtar,et al.  Signal Processing in Sequence Analysis: Advances in Eukaryotic Gene Prediction , 2008, IEEE Journal of Selected Topics in Signal Processing.