Coding Region Prediction Based on a Universal DNA Sequence Representation Method

Graphical representation of DNA sequences provides a simple and intuitive way of viewing, anchoring, and comparing various gene structures, so a simple and non-degenerate method is attractive to both biologists and computational biologists. In this study, a universal graphical representation method for DNA sequences based on S.S.-T. Yau's method is presented. The method adopts a trigonometric function to represent the four nucleotides A, G, C, and T. Some interesting characteristics of the universal representation are introduced. We exploit frequency analysis with our representation method on DNA sequences, demonstrating possible applications in coding region prediction, and sequence analysis. Based on the statistically experimental results from this frequency analysis, a simple coding region predictor and an optimized one are presented. An experiment on the broadly accepted ROSETTA data set demonstrates that the performance of the optimized predictor is comparable to that of other popular methods.

[1]  Thom Grace,et al.  Computer visualization of long genomic sequences , 1993, Proceedings Visualization '93.

[2]  EUGENE HAMORI,et al.  Novel DNA sequence representations , 1985, Nature.

[3]  H. J. Jeffrey Chaos game representation of gene structure. , 1990, Nucleic acids research.

[4]  Wei Wang,et al.  Computing linear transforms of symbolic signals , 2002, IEEE Trans. Signal Process..

[5]  R. Voss,et al.  Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. , 1992, Physical review letters.

[6]  Dimitris Anastassiou,et al.  Frequency-domain analysis of biomolecular sequences , 2000, Bioinform..

[7]  P. P. Va,et al.  Digital filters for gene prediction applications , 2002 .

[8]  Changchuan Yin,et al.  A Fourier Characteristic of Coding Sequences: Origins and a Non-Fourier Approximation , 2005, J. Comput. Biol..

[9]  M. A. GATES,et al.  Simpler DNA sequence representations , 1985, Nature.

[10]  E. A. Cheever,et al.  Using signal processing techniques for DNA sequence comparison , 1989, Proceedings of the Fifteenth Annual Northeast Bioengineering Conference.

[11]  M. Gates A simple way to look at DNA. , 1986, Journal of theoretical biology.

[12]  Chuan Yi Tang,et al.  EXONSCAN: EXON Prediction with Signal Detection and Coding Region AligNment in Homologous Sequences , 2005 .

[13]  Skolnick,et al.  Global fractal dimension of human DNA sequences treated as pseudorandom walks. , 1992, Physical review. A, Atomic, molecular, and optical physics.

[14]  Amir Niknejad,et al.  DNA sequence representation without degeneracy. , 2003, Nucleic acids research.

[15]  E. Hamori,et al.  H curves, a novel method of representation of nucleotide series especially suited for long DNA sequences. , 1983, The Journal of biological chemistry.

[16]  R. Guigó,et al.  Comparative gene prediction in human and mouse. , 2003, Genome research.

[17]  Wentian Li,et al.  Universal 1/f noise, crossovers of scaling exponents, and chromosome-specific patterns of guanine-cytosine content in DNA sequences of the human genome. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[18]  R Zhang,et al.  Z curves, an intutive tool for visualizing and analyzing the DNA sequences. , 1994, Journal of biomolecular structure & dynamics.

[19]  Peter Tiño,et al.  Spatial representation of symbolic sequences through iterative function systems , 1999, IEEE Trans. Syst. Man Cybern. Part A.

[20]  B. Berger,et al.  Human and Mouse Gene Structure: Comparative Analysis and Application to Exon Prediction , 2000 .

[21]  Yizhar Lavner,et al.  Gene prediction by spectral rotation measure: a new method for identifying protein-coding regions. , 2003, Genome research.

[22]  H. J. Jeffrey Chaos game representation of gene structure. , 1990, Nucleic acids research.

[23]  R. Guigó,et al.  EGASP: collaboration through competition to find human genes , 2005, Nature Methods.

[24]  L. Pachter,et al.  SLAM: cross-species gene finding and alignment with a generalized pair hidden Markov model. , 2003, Genome research.

[25]  Eugene Hamori Visualization of biological information encoded in DNA , 1994 .

[26]  P. Rouzé,et al.  Current methods of gene prediction, their strengths and weaknesses. , 2002, Nucleic acids research.

[27]  Shih-Chieh Su,et al.  Structural analysis of genomic sequences with matched filtering , 2003, Proceedings of the 25th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (IEEE Cat. No.03CH37439).