Frequency-domain analysis of biomolecular sequences

MOTIVATION Frequency-domain analysis of biomolecular sequences is hindered by their representation as strings of characters. If numerical values are assigned to each of these characters, then the resulting numerical sequences are readily amenable to digital signal processing. RESULTS We introduce new computational and visual tools for biomolecular sequences analysis. In particular, we provide an optimization procedure improving upon traditional Fourier analysis performance in distinguishing coding from noncoding regions in DNA sequences. We also show that the phase of a properly defined Fourier transform is a powerful predictor of the reading frame of protein coding regions. Resulting color maps help in visually identifying not only the existence of protein coding areas for both DNA strands, but also the coding direction and the reading frame for each of the exons. Furthermore, we demonstrate that color spectrograms can visually provide, in the form of local 'texture', significant information about biomolecular sequences, thus facilitating understanding of local nature, structure and function.

[1]  J. C. Shepherd Method to determine the reading frame of a protein from the purine/pyrimidine genome sequence and its possible evolutionary justification. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[2]  J. Fickett Recognition of protein coding regions in DNA sequences. , 1982, Nucleic acids research.

[3]  E. Hamori,et al.  H curves, a novel method of representation of nucleotide series especially suited for long DNA sequences. , 1983, The Journal of biological chemistry.

[4]  J. Ninio,et al.  Graphical coding of nucleic acid sequences. , 1985, Biochimie.

[5]  R. Linsker,et al.  A measure of DNA periodicity. , 1986, Journal of theoretical biology.

[6]  R. Voss,et al.  Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. , 1992, Physical review letters.

[7]  J. Fickett,et al.  Assessment of protein coding measures. , 1992, Nucleic acids research.

[8]  Wentian Li,et al.  Understanding long-range correlations in DNA sequences , 1994, chao-dyn/9403002.

[9]  E. Bacry,et al.  Characterizing long-range correlations in DNA sequences from wavelet analysis. , 1995, Physical review letters.

[10]  Russian Federation Size-dependence of three-periodicity and long-range correlations in DNA sequences , 1995 .

[11]  V. Chechetkin,et al.  Size-dependence of three-periodicity and long-range correlations in DNA sequences , 1995 .

[12]  I. Grosse,et al.  MEASURING CORRELATIONS IN SYMBOL SEQUENCES , 1995 .

[13]  M. Borodovsky,et al.  Nucleosome DNA sequence pattern revealed by multiple alignment of experimentally mapped sequences. , 1996, Journal of molecular biology.

[14]  J. Widom Short-range order in two eukaryotic genomes: relation to chromosome structure. , 1996, Journal of molecular biology.

[15]  J. Claverie Computational methods for the identification of genes in vertebrate genomic sequences. , 1997, Human molecular genetics.

[16]  S. Tiwari,et al.  Prediction of probable genes by Fourier analysis of genomic sequences , 1997, Comput. Appl. Biosci..

[17]  Sanjit K. Mitra,et al.  Digital Signal Processing: A Computer-Based Approach , 1997 .

[18]  S. Cebrat,et al.  The effect of DNA phase structure on DNA walks , 1998 .

[19]  E. Trifonov 3-, 10.5-, 200- and 400-base periodicities in genome sequences , 1998 .

[20]  Hanspeter Herzel,et al.  10-11 bp periodicities in complete genomes reflect protein structure and DNA folding , 1999, Bioinform..

[21]  M. Q. Zhang,et al.  Periodical distribution of transcription factor sites in promoter regions and connection with chromatin structure. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Alan V. Oppenheim,et al.  Discrete-time signal processing (2nd ed.) , 1999 .

[23]  A. Stein,et al.  A signal encoded in vertebrate DNA that influences nucleosome positioning and alignment. , 1999, Nucleic acids research.

[24]  S. Lewis,et al.  Genome annotation assessment in Drosophila melanogaster. , 2000, Genome research.

[25]  Dimitris Anasstassiou DIGITAL SIGNAL PROCESSING OF BIOMOLECULAR SEQUENCES , 2002 .