Wavelet Analysis in Current Cancer Genome Research: A Survey

With the rapid development of next generation sequencing technology, the amount of biological sequence data of the cancer genome increases exponentially, which calls for efficient and effective algorithms that may identify patterns hidden underneath the raw data that may distinguish cancer Achilles' heels. From a signal processing point of view, biological units of information, including DNA and protein sequences, have been viewed as one-dimensional signals. Therefore, researchers have been applying signal processing techniques to mine the potentially significant patterns within these sequences. More specifically, in recent years, wavelet transforms have become an important mathematical analysis tool, with a wide and ever increasing range of applications. The versatility of wavelet analytic techniques has forged new interdisciplinary bounds by offering common solutions to apparently diverse problems and providing a new unifying perspective on problems of cancer genome research. In this paper, we provide a survey of how wavelet analysis has been applied to cancer bioinformatics questions. Specifically, we discuss several approaches of representing the biological sequence data numerically and methods of using wavelet analysis on the numerical sequences.

[1]  J.B. Allen,et al.  A unified approach to short-time Fourier analysis and synthesis , 1977, Proceedings of the IEEE.

[2]  J. Meher,et al.  Wavelet Transform for Detection of Conserved Motifs inProtein Sequences with Ten Bit Physico-ChemicalProperties , 2012 .

[3]  J HandDavid Measuring classifier performance , 2009 .

[4]  A. Antoniou,et al.  Application of parametric window functions to the STDFT method for gene prediction , 2005, PACRIM. 2005 IEEE Pacific Rim Conference on Communications, Computers and signal Processing, 2005..

[5]  Tomaz Pisanski,et al.  Graphical representation of proteins as four-color maps and their numerical characterization. , 2009, Journal of molecular graphics & modelling.

[6]  Emmanuel Bacry,et al.  Wavelet based fractal analysis of DNA sequences , 1996 .

[7]  L. Carin,et al.  Sequential modeling for identifying CpG island locations in human genome , 2002, IEEE Signal Processing Letters.

[8]  En-Bing Lin,et al.  Wavelet Packet Analysis of DNA Sequences , 2011, 2011 5th International Conference on Bioinformatics and Biomedical Engineering.

[9]  Omid Abbasi,et al.  Exonic regions finding on DNA sequences using RLS algorithm and de noising with discrete wavelet , 2011, 2011 International Symposium on Artificial Intelligence and Signal Processing (AISP).

[10]  Guohui Ding,et al.  Prediction of protein coding regions by combining Fourier and Wavelet Transform , 2010, 2010 3rd International Congress on Image and Signal Processing.

[11]  R. M. C. Junior,et al.  Identification of Protein Coding Regions Using the Modified Gabor-Wavelet Transform , 2008, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[12]  Ahmad M. Sarhan,et al.  Wavelet-based feature extraction for DNA microarray classification , 2013, Artificial Intelligence Review.

[13]  Søren Brunak,et al.  Representation of Protein-Sequence Information by Amino Acid Subalphabets , 2004, AI Mag..

[14]  Tessamma Thomas,et al.  Discrete wavelet transform de-noising in eukaryotic gene splicing , 2010, BMC Bioinformatics.

[15]  Uwe Aickelin,et al.  Wavelet Feature Extraction and Genetic Algorithm for Biomarker Detection in Colorectal Cancer Data , 2013, Knowl. Based Syst..

[16]  Paul B. Albee,et al.  Multiresolution Analysis of DNA Sequences , 2010, 2010 Second International Conference on Computer Research and Development.

[17]  Hon Keung Kwan,et al.  Graphical representation of DNA sequences , 2009, 2009 IEEE International Conference on Electro/Information Technology.

[18]  K. Chou Pseudo Amino Acid Composition and its Applications in Bioinformatics, Proteomics and System Biology , 2009 .

[19]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[20]  Sushil Chandra,et al.  Wavelet Analysis of HIV-1 Genome , 2009, 2009 International Association of Computer Science and Information Technology - Spring Conference.

[21]  Chung J. Kuo,et al.  DNA Sequence Representation and Comparison Based on Quaternion Number System , 2012 .

[22]  Cathy H. Wu,et al.  Neural networks and genome informatics , 2000 .

[23]  Atulya K. Nagar,et al.  On wavelet-based adaptive approach for gene comparison , 2008, Int. J. Intell. Syst. Technol. Appl..

[24]  S. Raghavan,et al.  A survey of wavelet techniques and multiresolution analysis for cancer diagnosis , 2011, 2011 International Conference on Computer, Communication and Electrical Technology (ICCCET).

[25]  E. Birney,et al.  Patterns of somatic mutation in human cancer genomes , 2007, Nature.

[26]  A. Nair,et al.  A coding measure scheme employing electron-ion interaction pseudopotential (EIIP) , 2006, Bioinformation.

[27]  P.D. Cristea,et al.  Genomic signal processing , 2004, 7th Seminar on Neural Network Applications in Electrical Engineering, 2004. NEUREL 2004. 2004.

[28]  Eugenio Santos,et al.  A point mutation is responsible for the acquisition of transforming properties by the T24 human bladder carcinoma oncogene , 1982, Nature.

[29]  Rosemarie Swanson,et al.  A vector representation for amino acid sequences , 1984 .

[30]  M. Stratton,et al.  The cancer genome , 2009, Nature.

[31]  Lei Song,et al.  Computational Analysis of Genome-Wide DNA Copy Number Changes , 2011 .

[32]  Denise Gorse,et al.  Wavelet transforms for the characterization and detection of repeating motifs. , 2002, Journal of molecular biology.

[33]  Yonina C. Eldar,et al.  A fast and flexible method for the segmentation of aCGH data , 2008, ECCB.

[34]  Changchuan Yin,et al.  Numerical representation of DNA sequences based on genetic code context and its applications in periodicity analysis of genomes , 2008, 2008 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology.

[35]  L. Prasad,et al.  WAVELET ANALYSIS with Applications to IMAGE PROCESSING , 1997 .

[36]  Th. Boveri Concerning the Origin of Malignant Tumours by Theodor Boveri. Translated and annotated by Henry Harris , 2008, Journal of Cell Science.

[37]  Minoru Kanehisa,et al.  AAindex: amino acid index database, progress report 2008 , 2007, Nucleic Acids Res..

[38]  Hong Yan,et al.  Studies of spectral properties of short genes using the wavelet subspace Hilbert–Huang transform (WSHHT) , 2008 .

[39]  Ren Zhang,et al.  The Z curve database: a graphic representation of genome sequences , 2003, Bioinform..

[40]  Carlo Cattani,et al.  Fractals and Hidden Symmetries in DNA , 2010 .

[41]  D. T. Jones,et al.  Successful recognition of protein folds using threading methods biased by sequence similarity and predicted secondary structure , 1999, Proteins.

[42]  Feng Liu,et al.  Identification of somatic mutations in human prostate cancer by RNA-Seq. , 2013, Gene.

[43]  Binwei Weng,et al.  Discriminating DNA Sequences from Terahertz Spectroscopy - A Wavelet Domain Analysis , 2006, Proceedings of the IEEE 32nd Annual Northeast Bioengineering Conference.

[44]  David J. Hand,et al.  Measuring classifier performance: a coherent alternative to the area under the ROC curve , 2009, Machine Learning.

[45]  Haixu Tang,et al.  On the Mutational Topology of the Bacterial Genome , 2013, G3: Genes, Genomes, Genetics.

[46]  Ping-an He,et al.  A Novel Descriptor for Protein Similarity Analysis , 2011 .

[47]  C. Verfaillie,et al.  BCR/ABL-mediated downregulation of genes implicated in cell adhesion and motility leads to impaired migration toward CCR7 ligands CCL19 and CCL21 in primary BCR/ABL-positive cells , 2005, Leukemia.

[48]  Kateryna D. Makova,et al.  Ride the wavelet: A multiscale analysis of genomic contexts flanking small insertions and deletions. , 2009, Genome research.

[49]  Jian-Ding Qiu,et al.  Using support vector machines for prediction of protein structural classes based on discrete wavelet transform , 2009, J. Comput. Chem..

[50]  Qiang Fang,et al.  Protein sequence comparison based on the wavelet transform approach. , 2002, Protein engineering.

[51]  Minxin Chen,et al.  Wavelet Transform Based Protein Decoy Discrimination , 2009, 2009 3rd International Conference on Bioinformatics and Biomedical Engineering.

[52]  Qianqian Liu,et al.  Identification of Splice Sites Based on Discrete Wavelet Transform and Support Vector Machine , 2008, 2008 2nd International Conference on Bioinformatics and Biomedical Engineering.

[53]  I. Cosic Macromolecular bioactivity: is it resonant interaction between macromolecules?-theory and applications , 1994, IEEE Transactions on Biomedical Engineering.

[54]  C. Gargour,et al.  A short introduction to wavelets and their applications , 2009, IEEE Circuits and Systems Magazine.

[55]  Feng Liu,et al.  Predicting protein secondary structure using continuous wavelet transform and Chou-Fasman method , 2005, 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference.

[56]  R Zhang,et al.  Z curves, an intutive tool for visualizing and analyzing the DNA sequences. , 1994, Journal of biomolecular structure & dynamics.

[57]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[58]  Kuo-Chen Chou,et al.  Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes , 2005, Bioinform..

[59]  Azween Abdullah,et al.  A Novel Optimized Approach for Gene Identification in DNA Sequences , 2011 .

[60]  Kuldip Singh,et al.  A Time Series Approach for Identification of Exons and Introns , 2007, 10th International Conference on Information Technology (ICIT 2007).

[61]  Paul Dan Cristea,et al.  Genetic signal representation and analysis , 2002, SPIE BiOS.

[62]  R. Linsker,et al.  A measure of DNA periodicity. , 1986, Journal of theoretical biology.

[63]  Omid Abbasi,et al.  RESEARCH ARTICLE Open Access Identification of exonic regions in DNA sequences , 2022 .

[64]  M OHAMED E L-Z ANATY,et al.  Haar Wavelet Transform of The Signal Representation of DNA Sequences , 2011 .

[65]  E. Bacry,et al.  Characterizing long-range correlations in DNA sequences from wavelet analysis. , 1995, Physical review letters.

[66]  L. Chin,et al.  Making sense of cancer genomic data. , 2011, Genes & development.

[67]  Maria Dulce Quelhas,et al.  Wavelet analysis of human DNA. , 2011, Genomics.

[68]  Xiaoyong Zou,et al.  Prediction of protein secondary structure based on continuous wavelet transform. , 2003, Talanta.

[69]  Jamal Tuqan,et al.  Gene Identification Using the Z-Curve Representation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[70]  R Dulbecco,et al.  A turning point in cancer research: sequencing the human genome. , 1986, Science.

[71]  A. Haar Zur Theorie der orthogonalen Funktionensysteme , 1910 .

[72]  Hong Yan,et al.  Autoregressive Models for Spectral Analysis of Short Tandem Repeats in DNA Sequences , 2006, 2006 IEEE International Conference on Systems, Man and Cybernetics.

[73]  D. Goodrich,et al.  RB1, development, and cancer. , 2011, Current topics in developmental biology.

[74]  Tianming Wang,et al.  On Graphical and Numerical Representation of Protein Sequences , 2006, Journal of biomolecular structure & dynamics.

[75]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[76]  Andreas Spanias,et al.  Waveform Mapping and Time-Frequency Processing of DNA and Protein Sequences , 2011, IEEE Transactions on Signal Processing.

[77]  Mohammed Abo-Zahhad,et al.  Genomic Analysis and Classification of Exon and Intron Sequences Using DNA Numerical Mapping Techniques , 2012 .

[78]  Rafayah Mousa,et al.  Breast cancer diagnosis system based on wavelet analysis and fuzzy-neural , 2005, Expert Syst. Appl..

[79]  Stéphane Mallat,et al.  A Theory for Multiresolution Signal Decomposition: The Wavelet Representation , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[80]  Tessamma Thomas,et al.  A Frequency Domain Approach to Protein Sequence Similarity Analysis and Functional Classification , 2011 .

[81]  Eliathamby Ambikairajah,et al.  Parallel implementation of genomic sequences classification using modified Gabor wavelet transform on multicore systems , 2012, 2012 International Conference on Biomedical Engineering (ICoBE).

[82]  R. Voss,et al.  Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. , 1992, Physical review letters.

[83]  Mikael Bodén,et al.  BLOMAP: An encoding of amino acids which improves signal peptide cleavage site prediction , 2005, APBC.

[84]  Ujjwal Maulik,et al.  Multiobjective Genetic Fuzzy Clustering of Categorical Attributes , 2007 .

[85]  J. Shore On the Application of Haar Functions , 1973, IEEE Trans. Commun..

[86]  Prashant Parikh A Theory of Communication , 2010 .

[87]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[88]  Simon Tavaré,et al.  CNAseg - a novel framework for identification of copy number changes in cancer from second-generation sequencing data , 2010, Bioinform..

[89]  Seah Hock Soon,et al.  Frequency-Domain Algorithms for Visual Analysis on Genomic Structures in Prokaryotes , 2006, International Conference on Computer Graphics, Imaging and Visualisation (CGIV'06).

[90]  Carlo Cattani,et al.  On the Existence of Wavelet Symmetries in Archaea DNA , 2011, Comput. Math. Methods Medicine.

[91]  Carlo Cattani,et al.  Complex Representation of DNA Sequences , 2008, BIRD.

[92]  Tao Xie,et al.  Inferring causal genomic alterations in breast cancer using gene expression data , 2011, BMC Systems Biology.

[93]  Michael Krawczak,et al.  Microdeletions and microinsertions causing human genetic disease: common mechanisms of mutagenesis and the role of local DNA sequence complexity , 2005, Human mutation.