Novel graphical representation and numerical characterization of DNA sequences

Modern sequencing technique has provided a wealth of data on DNA sequences, which has made the analysis and comparison of sequences a very important but difficult task. In this paper, by regarding the dinucleotide as a 2-combination of the multiset { ∞ · A , ∞ · G , ∞ · C , ∞ · T } , a novel 3-D graphical representation of a DNA sequence is proposed, and its projections on planes (x,y), (y,z) and (x,z) are also discussed. In addition, based on the idea of “piecewise function”, a cell-based descriptor vector is constructed to numerically characterize the DNA sequence. The utility of our approach is illustrated by the examination of phylogenetic analysis on four datasets.

[1]  Dejan Plavšić,et al.  Milestones in graphical bioinformatics , 2013 .

[2]  R Zhang,et al.  Z curves, an intutive tool for visualizing and analyzing the DNA sequences. , 1994, Journal of biomolecular structure & dynamics.

[3]  Chun Li,et al.  Numerical characterization and similarity analysis of DNA sequences based on 2-D graphical representation of the characteristic sequences. , 2003, Combinatorial chemistry & high throughput screening.

[4]  Jun Wang,et al.  New Invariant of DNA Sequences , 2005, J. Chem. Inf. Model..

[5]  C. Munteanu,et al.  Generalized lattice graphs for 2D-visualization of biological information , 2009, Journal of Theoretical Biology.

[6]  Joan Hérisson,et al.  A 3D pattern matching algorithm for DNA sequences , 2007, Bioinform..

[7]  Jun Feng,et al.  A protein mapping method based on physicochemical properties and dimension reduction , 2015, Comput. Biol. Medicine.

[8]  Ming-hui Li,et al.  Seoul Virus and Hantavirus Disease, Shenyang, People’s Republic of China , 2009, Emerging infectious diseases.

[9]  Ren Zhang,et al.  The Z curve database: a graphic representation of genome sequences , 2003, Bioinform..

[10]  Zhu-Jin Zhang DV-Curve: a novel intuitive tool for visualizing and analyzing DNA sequences , 2009, Bioinform..

[11]  G. Bianciardi,et al.  Nonlinear Analysis of tRNAs Nucleotide Sequences by Random Walks: Randomness and Order in the Primitive Informational Polymers , 2015, Journal of Molecular Evolution.

[12]  S. Karlin,et al.  Global dinucleotide signatures and analysis of genomic heterogeneity. , 1998, Current opinion in microbiology.

[13]  Tianming Wang,et al.  Linear regression model of short k-word: a similarity distance suitable for biological sequences with various lengths. , 2013, Journal of theoretical biology.

[14]  Changchuan Yin,et al.  Two Dimensional Yau-Hausdorff Distance with Applications on Comparison of DNA and Protein Sequences , 2015, PloS one.

[15]  Milan Randic,et al.  On 3‐D Graphical Representation of DNA Primary Sequences and Their Numerical Characterization. , 2001 .

[16]  Ashesh Nandy,et al.  Graphical representation and mathematical characterization of protein sequences and applications to viral proteins , 2011, Advances in Protein Chemistry and Structural Biology.

[17]  M. Nei,et al.  MEGA: Molecular Evolutionary Genetics Analysis, Version 1.02. , 1995 .

[18]  S. Karlin,et al.  Dinucleotide relative abundance extremes: a genomic signature. , 1995, Trends in genetics : TIG.

[19]  Doug Brutlag,et al.  Multiple Sequence Alignment Multiple Sequence Alignment , 2003 .

[20]  C. Li,et al.  Vector representation and its application of DNA sequences based on nucleotide triplet codons. , 2015, Journal of molecular graphics & modelling.

[21]  M. A. GATES,et al.  Simpler DNA sequence representations , 1985, Nature.

[22]  M. Nei,et al.  Molecular Evolutionary Genetics Analysis , 2007 .

[23]  Chun Li,et al.  Non-degenerate graphical representation of DNA sequences and its applications to phylogenetic analysis. , 2013, Combinatorial chemistry & high throughput screening.

[24]  Ren Zhang,et al.  A Brief Review: The Z-curve Theory and its Application in Genome Analysis , 2014, Current genomics.

[25]  Robert C. Edgar,et al.  Multiple sequence alignment. , 2006, Current opinion in structural biology.

[26]  Alexandru T Balaban,et al.  Graphical representation of proteins. , 2011, Chemical reviews.

[27]  Xiaolei Wang,et al.  Similarity analysis of DNA sequences based on the weighted pseudo‐entropy , 2011, J. Comput. Chem..

[28]  H. J. Jeffrey Chaos game representation of gene structure. , 1990, Nucleic acids research.

[29]  Piotr Wąż,et al.  3D-dynamic representation of DNA sequences , 2014, Journal of Molecular Modeling.

[30]  P. M. Leong,et al.  Random walk and gap plots of DNA sequences , 1995, Comput. Appl. Biosci..

[31]  Ping-an He,et al.  A novel descriptor of protein sequences and its application. , 2014, Journal of theoretical biology.

[32]  Ping-an He,et al.  A graphical representation of protein based on a novel iterated function system , 2014 .

[33]  E. Hamori,et al.  H curves, a novel method of representation of nucleotide series especially suited for long DNA sequences. , 1983, The Journal of biological chemistry.

[34]  A. Nandy,et al.  A new graphical representation and analysis of DNA sequence structure. I: Methodology and application to globin genes , 1994 .

[35]  M. Blaser,et al.  Evolutionary implications of microbial genome tetranucleotide frequency biases. , 2003, Genome research.

[36]  A Danchin,et al.  Oligonucleotide bias in Bacillus subtilis: general trends and taxonomic comparisons. , 1998, Nucleic acids research.

[37]  Chun Li,et al.  Directed graphs of DNA sequences and their numerical characterization. , 2006, Journal of theoretical biology.