Waveform Mapping and Time-Frequency Processing of DNA and Protein Sequences

Current state-of-the-art approaches for biological sequence querying and alignment require preprocessing and lack robustness to repetitions in the sequence. In addition, these approaches do not provide much support for efficiently querying subsequences, a process that is essential for tracking localized database matches. We propose a query-based alignment method for biological sequences that first maps sequences to time-domain waveforms before processing the waveforms for alignment in the time-frequency plane. The mapping uses waveforms, such as Gaussian functions, with unique sequence representations in the time-frequency plane. The proposed alignment method employs a robust querying algorithm that utilizes a time-frequency signal expansion whose basis function is matched to the basic waveform in the mapped sequences. The resulting WAVEQuery approach was demonstrated for both deoxyribonucleic acid (DNA) and protein sequences using the matching pursuit decomposition as the signal basis expansion. We specifically evaluated the alignment localization of WAVEQuery over repetitive database segments, and we demonstrated its operation in real-time without preprocessing. We also demonstrated that WAVEQuery significantly outperformed the biological sequence alignment method BLAST for queries with repetitive segments for DNA sequences. A generalized version of the WAVEQuery approach with the metaplectic transform is also described for protein sequence structure prediction.

[1]  Antonia Papandreou-Suppappola,et al.  Analysis and classification of time-varying signals with multiple time-frequency structures , 2002, IEEE Signal Processing Letters.

[2]  Qiang Fang,et al.  Protein sequence comparison based on the wavelet transform approach. , 2002, Protein engineering.

[3]  Douglas L. Jones,et al.  New signal-space orthonormal bases via the metaplectic transform , 1992, [1992] Proceedings of the IEEE-SP International Symposium on Time-Frequency and Time-Scale Analysis.

[4]  Jignesh M. Patel,et al.  OASIS: An Online and Accurate Technique for Local-alignment Searches on Biological Sequences , 2003, VLDB.

[5]  Leonidas D. Iasemidis,et al.  Autoregressive Modeling and Feature Analysis of DNA Sequences , 2004, EURASIP J. Adv. Signal Process..

[6]  Joseph Felsenstein,et al.  An efficient method for matching nucleic acid sequences , 1982, Nucleic Acids Res..

[7]  Mauro Grigioni,et al.  SWIFT (sequence-wide investigation with Fourier transform): a software tool for identifying proteins of a given class from the unannotated genome sequence , 2005, Bioinform..

[8]  Jason E. Stajich,et al.  DNA Sequence Databases , 2009 .

[9]  Piotr J. Durka,et al.  Matching Pursuit and Unification in EEG Analysis , 2007 .

[10]  Andrzej K. Brodzik A comparative study of cross-correlation methods for alignment of DNA sequences containing repetitive patterns , 2005, 2005 13th European Signal Processing Conference.

[11]  P.D. Cristea,et al.  Genomic signal processing , 2004, 7th Seminar on Neural Network Applications in Electrical Engineering, 2004. NEUREL 2004. 2004.

[12]  Panos M. Pardalos,et al.  Efficient Algorithms for Local Alignment Search , 2001, J. Comb. Optim..

[13]  Michael G. Walker,et al.  SST: An algorithm for searching sequence databases in time proportional to the logarithm of the database size , 2000 .

[14]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Lawrence Carin,et al.  Matching pursuits with a wave-based dictionary , 1997, IEEE Trans. Signal Process..

[16]  Per Ödling,et al.  Orthogonal frequency-division multiplexing (OFDM) , 1999 .

[17]  Kurt Bernardo Wolf,et al.  The metaplectic group within the Heisenberg-Weyl ring , 1986 .

[18]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[19]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[20]  Hon Keung Kwan,et al.  Wavelet analysis of DNA sequences , 2008, 2008 International Conference on Communications, Circuits and Systems.

[21]  Alan L Rockwood,et al.  Sequence alignment by cross-correlation. , 2005, Journal of biomolecular techniques : JBT.

[22]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[23]  J. Davies,et al.  Molecular Biology of the Cell , 1983, Bristol Medico-Chirurgical Journal.

[24]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[25]  Siu-Ming Yiu,et al.  Compressed indexing and local alignment of DNA , 2008, Bioinform..

[26]  E. A. Cheever,et al.  Using signal processing techniques for DNA sequence comparison , 1989, Proceedings of the Fifteenth Annual Northeast Bioengineering Conference.

[27]  Khaled H. Hamed,et al.  Time-frequency analysis , 2003 .

[28]  W. Kabir,et al.  Orthogonal Frequency Division Multiplexing (OFDM) , 2008, 2008 China-Japan Joint Microwave Conference.

[29]  K. Katoh,et al.  MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. , 2002, Nucleic acids research.

[30]  A. Papandreou-Suppappola,et al.  DNA sequence alignment using the matching pursuit decomposition , 2008, 2008 IEEE International Workshop on Genomic Signal Processing and Statistics.

[31]  Martin Vingron,et al.  q-gram based database searching using a suffix array (QUASAR) , 1999, RECOMB.

[32]  Sean R Eddy,et al.  Where did the BLOSUM62 alignment score matrix come from? , 2004, Nature Biotechnology.

[33]  Antonia Papandreou-Suppappola,et al.  Time-frequency based biological sequence querying , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[34]  Bin Wang,et al.  VGRAM: Improving Performance of Approximate Queries on String Collections Using Variable-Length Grams , 2007, VLDB.

[35]  Wei Wang,et al.  Computing linear transforms of symbolic signals , 2002, IEEE Trans. Signal Process..

[36]  Emmanuel Bacry,et al.  What can we learn with wavelets about DNA sequences , 1998 .

[37]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.