Application of discrete Fourier inter-coefficient difference for assessing genetic sequence similarity

Digital signal processing (DSP) techniques for biological sequence analysis continue to grow in popularity due to the inherent digital nature of these sequences. DSP methods have demonstrated early success for detection of coding regions in a gene. Recently, these methods are being used to establish DNA gene similarity. We present the inter-coefficient difference (ICD) transformation, a novel extension of the discrete Fourier transformation, which can be applied to any DNA sequence. The ICD method is a mathematical, alignment-free DNA comparison method that generates a genetic signature for any DNA sequence that is used to generate relative measures of similarity among DNA sequences. We demonstrate our method on a set of insulin genes obtained from an evolutionarily wide range of species, and on a set of avian influenza viral sequences, which represents a set of highly similar sequences. We compare phylogenetic trees generated using our technique against trees generated using traditional alignment techniques for similarity and demonstrate that the ICD method produces a highly accurate tree without requiring an alignment prior to establishing sequence similarity.

[1]  Alessandro Neri,et al.  New approaches to genome sequence analysis based on digital signal processing , 2002 .

[2]  D. Higgins,et al.  Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega , 2011, Molecular systems biology.

[3]  S. N. Sharma,et al.  Evaluation of DNA mapping schemes for exon detection , 2011, 2011 International Conference on Computer, Communication and Electrical Technology (ICCCET).

[4]  Divyakant Agrawal,et al.  A comparison of DFT and DWT based similarity search in time-series databases , 2000, CIKM '00.

[5]  Rodrigo Lopez,et al.  Clustal W and Clustal X version 2.0 , 2007, Bioinform..

[6]  Klaus Peter Schliep,et al.  phangorn: phylogenetic analysis in R , 2010, Bioinform..

[7]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[8]  Dimitris Anastassiou,et al.  Frequency-domain analysis of biomolecular sequences , 2000, Bioinform..

[9]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[10]  Armando J. Pinho,et al.  Genome analysis with inter-nucleotide distances , 2009, Bioinform..

[11]  Richard G. Lyons,et al.  Understanding Digital Signal Processing , 1996 .

[12]  Vinay Kumar Srivastava,et al.  Performance analysis of different DNA to numerical mapping techniques for identification of protein coding regions using tapered window based short-time discrete Fourier transform , 2010, 2010 International Conference on Power, Control and Embedded Systems.

[13]  Jamal Tuqan,et al.  A DSP Approach for Finding the Codon Bias in DNA Sequences , 2008, IEEE Journal of Selected Topics in Signal Processing.

[14]  A. W. M. van den Enden,et al.  Discrete Time Signal Processing , 1989 .

[15]  M. N. Shanmukha Swamy,et al.  Analysis of Genomics and Proteomics Using DSP Techniques , 2008, IEEE Transactions on Circuits and Systems I: Regular Papers.

[16]  A. Phillips,et al.  Multiple sequence alignment in phylogenetic analysis. , 2000, Molecular phylogenetics and evolution.

[17]  P. P. Vaidyanathan,et al.  The role of signal-processing concepts in genomics and proteomics , 2004, J. Frankl. Inst..

[18]  Alan R. Jones,et al.  Fast Fourier Transform , 1970, SIGP.

[19]  Tore Samuelsson,et al.  Genomics and Bioinformatics: An Introduction to Programming Tools for Life Scientists , 2012 .

[20]  Bernhard Haubold,et al.  Alignment-free detection of local similarity among viral and bacterial genomes , 2011, Bioinform..

[21]  Stuart A. Kauffman,et al.  ORIGINS OF ORDER , 2019, Origins of Order.

[22]  P D Cristea Conversion of nucleotides sequences into genomic signals , 2002, Journal of cellular and molecular medicine.

[23]  G. Air,et al.  Sequence relationships among the hemagglutinin genes of 12 subtypes of influenza A virus. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[24]  P.D. Cristea,et al.  Genomic signal processing , 2004, 7th Seminar on Neural Network Applications in Electrical Engineering, 2004. NEUREL 2004. 2004.

[25]  Mahmood Akhtar,et al.  Signal Processing in Sequence Analysis: Advances in Eukaryotic Gene Prediction , 2008, IEEE Journal of Selected Topics in Signal Processing.

[26]  H. Saberkari,et al.  Prediction of protein coding regions in DNA sequences using signal processing methods , 2012, 2012 IEEE Symposium on Industrial Electronics and Applications.

[27]  S. C. Kremer,et al.  Gene Prediction Based on DNA Spectral Analysis: A Literature Review , 2011, J. Comput. Biol..

[28]  Giosuè Lo Bosco,et al.  Applications of alignment-free methods in epigenomics , 2014, Briefings Bioinform..

[29]  Jonas S. Almeida,et al.  Alignment-free sequence comparison-a review , 2003, Bioinform..

[30]  Dhundy Bastola,et al.  Alignment-free genetic sequence comparisons: a review of recent approaches by word analysis , 2014, Briefings Bioinform..

[31]  K. Katoh,et al.  MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. , 2002, Nucleic acids research.

[32]  Alberto O. Mendelzon,et al.  Efficient Retrieval of Similar Time Sequences Using DFT , 1998, FODO.

[33]  Durbin,et al.  Biological Sequence Analysis , 1998 .

[34]  E. A. Cheever,et al.  Using signal processing techniques for DNA sequence comparison , 1989, Proceedings of the Fifteenth Annual Northeast Bioengineering Conference.

[35]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[36]  R. Voss,et al.  Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. , 1992, Physical review letters.

[37]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[38]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[39]  Se-Ran Jun,et al.  Whole-genome phylogeny of mammals: Evolutionary information in genic and nongenic regions , 2009, Proceedings of the National Academy of Sciences.

[40]  Alan V. Oppenheim,et al.  Discrete-time signal processing (2nd ed.) , 1999 .

[41]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .