A biological inspired fuzzy adaptive window median filter (FAWMF) for enhancing DNA signal processing

BACKGROUND AND OBJECTIVE Digital signal processing techniques commonly employ fixed length window filters to process the signal contents. DNA signals differ in characteristics from common digital signals since they carry nucleotides as contents. The nucleotides own genetic code context and fuzzy behaviors due to their special structure and order in DNA strand. Employing conventional fixed length window filters for DNA signal processing produce spectral leakage and hence results in signal noise. A biological context aware adaptive window filter is required to process the DNA signals. METHODS This paper introduces a biological inspired fuzzy adaptive window median filter (FAWMF) which computes the fuzzy membership strength of nucleotides in each slide of window and filters nucleotides based on median filtering with a combination of s-shaped and z-shaped filters. Since coding regions cause 3-base periodicity by an unbalanced nucleotides' distribution producing a relatively high bias for nucleotides' usage, such fundamental characteristic of nucleotides has been exploited in FAWMF to suppress the signal noise. RESULTS Along with adaptive response of FAWMF, a strong correlation between median nucleotides and the Π shaped filter was observed which produced enhanced discrimination between coding and non-coding regions contrary to fixed length conventional window filters. The proposed FAWMF attains a significant enhancement in coding regions identification i.e. 40% to 125% as compared to other conventional window filters tested over more than 250 benchmarked and randomly taken DNA datasets of different organisms. CONCLUSION This study proves that conventional fixed length window filters applied to DNA signals do not achieve significant results since the nucleotides carry genetic code context. The proposed FAWMF algorithm is adaptive and outperforms significantly to process DNA signal contents. The algorithm applied to variety of DNA datasets produced noteworthy discrimination between coding and non-coding regions contrary to fixed window length conventional filters.

[1]  Saikat Singha Roy,et al.  Polyphase filtering with variable mapping rule in protein coding region prediction , 2017 .

[2]  Dimitris Anastassiou,et al.  Genomic signal processing , 2001, IEEE Signal Process. Mag..

[3]  Changchuan Yin,et al.  Prediction of protein coding regions by the 3-base periodicity analysis of a DNA sequence. , 2007, Journal of theoretical biology.

[4]  Ito Wasito,et al.  Fractal dimension approach for clustering of DNA sequences based on internucleotide distance , 2013, 2013 International Conference of Information and Communication Technology (ICoICT).

[5]  Alan V. Oppenheim,et al.  Discrete-Time Signal Pro-cessing , 1989 .

[6]  Dimitris Anastassiou,et al.  Frequency-domain analysis of biomolecular sequences , 2000, Bioinform..

[7]  Guangchen Liu,et al.  Identification of Protein Coding Regions in the Eukaryotic DNA Sequences Based on Marple Algorithm and Wavelet Packets Transform , 2014 .

[8]  Ganapati Panda,et al.  Identification of Protein-Coding Regions in DNA Sequences Using A Time-Frequency Filtering Approach , 2011, Genom. Proteom. Bioinform..

[9]  Patrick Cramer,et al.  Structure–function studies of the RNA polymerase II elongation complex , 2009, Acta crystallographica. Section D, Biological crystallography.

[10]  Eivind Coward,et al.  Equivalence of two Fourier methods for biological sequences , 1997 .

[11]  D. K. Shakya,et al.  A DSP-Based Approach for Gene Prediction in Eukaryotic Genes , 2011 .

[12]  T.S. Gunawan On the optimal window shape for genomic signal processing , 2008, 2008 International Conference on Computer and Communication Engineering.

[13]  Suprakash Datta,et al.  DFT based DNA splicing algorithms for prediction of protein coding regions , 2004, Conference Record of the Thirty-Eighth Asilomar Conference on Signals, Systems and Computers, 2004..

[14]  Yizhar Lavner,et al.  Gene prediction by spectral rotation measure: a new method for identifying protein-coding regions. , 2003, Genome research.

[15]  E. Ambikairajah,et al.  On DNA Numerical Representations for Period-3 Based Exon Prediction , 2007, 2007 IEEE International Workshop on Genomic Signal Processing and Statistics.

[16]  Stefan C. Kremer,et al.  Prediction of Protein Coding Regions Using a Wide-Range Wavelet Window Method , 2016, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[17]  Vinay Kumar Srivastava,et al.  Performance analysis of different DNA to numerical mapping techniques for identification of protein coding regions using tapered window based short-time discrete Fourier transform , 2010, 2010 International Conference on Power, Control and Embedded Systems.

[18]  Jamal Tuqan,et al.  A DSP Approach for Finding the Codon Bias in DNA Sequences , 2008, IEEE Journal of Selected Topics in Signal Processing.

[19]  A. Antoniou,et al.  Application of parametric window functions to the STDFT method for gene prediction , 2005, PACRIM. 2005 IEEE Pacific Rim Conference on Communications, Computers and signal Processing, 2005..

[20]  A. V. Lobanov,et al.  Genetic Code Supports Targeted Insertion of Two Amino Acids by One Codon , 2009, Science.

[21]  Zhiwei Shen,et al.  Short Exon Detection via Wavelet Transform Modulus Maxima , 2016, PloS one.

[22]  S. Tiwari,et al.  Prediction of probable genes by Fourier analysis of genomic sequences , 1997, Comput. Appl. Biosci..

[23]  Yazhu Chen,et al.  A Brief Review of Computational Gene Prediction Methods , 2004, Genomics, proteomics & bioinformatics.

[24]  Low Tang Jung,et al.  From DNA to protein: Why genetic code context of nucleotides for DNA signal processing? A review , 2017, Biomed. Signal Process. Control..

[25]  Muneer Ahmad A Biologically-Inspired Computational Solution for Protein Coding Regions Identification in Noisy DNA Sequences , 2016 .

[26]  Amir Asif,et al.  A fast DFT based gene prediction algorithm for identification of protein coding regions , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[27]  J. Fickett Recognition of protein coding regions in DNA sequences. , 1982, Nucleic acids research.

[28]  Karl-Heinz Zimmermann,et al.  DNA Computing Models , 2008 .

[29]  Changchuan Yin,et al.  A Fourier Characteristic of Coding Sequences: Origins and a Non-Fourier Approximation , 2005, J. Comput. Biol..

[30]  M. Omair Ahmad,et al.  Prediction of protein-coding regions in DNA sequences using a model-based approach , 2008, 2008 IEEE International Symposium on Circuits and Systems.

[31]  Mahmood Akhtar,et al.  Signal Processing in Sequence Analysis: Advances in Eukaryotic Gene Prediction , 2008, IEEE Journal of Selected Topics in Signal Processing.

[32]  A. Nair,et al.  A coding measure scheme employing electron-ion interaction pseudopotential (EIIP) , 2006, Bioinformation.

[33]  Low Tang Jung,et al.  On fuzzy semantic similarity measure for DNA coding , 2016, Comput. Biol. Medicine.

[34]  Omid Abbasi,et al.  RESEARCH ARTICLE Open Access Identification of exonic regions in DNA sequences , 2022 .

[35]  R. M. C. Junior,et al.  Identification of Protein Coding Regions Using the Modified Gabor-Wavelet Transform , 2008, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[36]  Tessamma Thomas,et al.  Discrete wavelet transform de-noising in eukaryotic gene splicing , 2010, BMC Bioinformatics.

[37]  A. Antoniou Digital Signal Processing: Signals, Systems, and Filters , 2005 .

[38]  Mahadev D. Uplane,et al.  Use of Kaiser window for ECG processing , 2006 .

[39]  Ajit Khosla,et al.  Special Issue on 2nd International Conference on Smart Systems Engineering (SmaSys 2014) , 2015, Microsystem Technologies.

[40]  Kevin R. Thornton,et al.  The origin of new genes: glimpses from the young and old , 2003, Nature Reviews Genetics.