Similarity analysis of DNA sequences based on a compact representation

Randić et al. proposed a significant graphical representation for DNA sequences, which is very compact and avoids loss of information. In this paper, we build a fast algorithm for this graphical representation with time complexity O(n2), and find another important advantage in the representation: no degeneracy. Moreover, we propose a new method to do similarity analysis of DNA sequences based on the representation. The approach adopts four elements of covariance matrix as a descriptor, and is illustrated on the first exon of beta-globin genes from 11 different species.

[1]  R Zhang,et al.  Z curves, an intutive tool for visualizing and analyzing the DNA sequences. , 1994, Journal of biomolecular structure & dynamics.

[2]  Zhu-Jin Zhang DV-Curve: a novel intuitive tool for visualizing and analyzing DNA sequences , 2009, Bioinform..

[3]  E. Hamori,et al.  H curves, a novel method of representation of nucleotide series especially suited for long DNA sequences. , 1983, The Journal of biological chemistry.

[4]  M. A. GATES,et al.  Simpler DNA sequence representations , 1985, Nature.

[5]  Ren Zhang,et al.  The Z curve database: a graphic representation of genome sequences , 2003, Bioinform..

[6]  Zhao-Hui Qi,et al.  PN-curve: A 3D graphical representation of DNA sequences and their numerical characterization , 2007 .

[7]  Dejan Plavšić,et al.  Novel 2-D graphical representation of DNA sequences and their numerical characterization , 2003 .

[8]  H. J. Jeffrey Chaos game representation of gene structure. , 1990, Nucleic acids research.

[9]  EUGENE HAMORI,et al.  Novel DNA sequence representations , 1985, Nature.

[10]  Xiao Sun,et al.  Analysis of Similarities/Dissimilarities of DNA Sequences Based on a Novel Graphical Representation , 2010 .

[11]  Marjan Vracko,et al.  Compact 2-D graphical representation of DNA , 2003 .

[12]  Alan Wee-Chung Liew,et al.  DB-Curve: a novel 2D method of DNA sequence visualization and representation , 2003 .

[13]  Guohua Huang,et al.  H–L curve: A novel 2D graphical representation for DNA sequences , 2008 .

[14]  Bo Liao A 2D graphical representation of DNA sequence , 2005 .

[15]  Dejan Plavšić,et al.  Analysis of similarity/dissimilarity of DNA sequences based on novel 2-D graphical representation , 2003 .

[16]  Milan Randic,et al.  A novel 2-D graphical representation of DNA sequences of low degeneracy , 2001 .

[17]  A. Nandy A new graphical representation and analysis of DNA sequence structure. I: Methodology and application to globin genes , 1994 .

[18]  Kequan Ding,et al.  Application of 2-D graphical representation of DNA sequence , 2005 .

[19]  Zhao-Hui Qi,et al.  New 3D graphical representation of DNA sequence based on dual nucleotides , 2007, Journal of Theoretical Biology.

[20]  M. Gardner Knotted doughnuts and other mathematical entertainments , 1986 .

[21]  Xiao Sun,et al.  TN curve: A novel 3D graphical representation of DNA sequence based on trinucleotides and its applications , 2009, Journal of Theoretical Biology.

[22]  Bo Liao,et al.  Analysis of Similarity / Dissimilarity of DNA Sequences Based on Dual Nucleotides , .