A novel representation of DNA sequence based on CMI coding

Graphical representation of DNA sequences provides a simple and intuitive way of analyzing and sorting various gene sequences. It is attractive to researchers to propose much more appropriate methods. In this study, a new graphical representation is presented. The method adopts the CMI coding to represent four nucleotides-A, G, C and T. Our approach considers not only the sequences’ structure but also the chemical structure for DNA sequence. We take several sets of data to test our method. The results of our experiment demonstrate that our representation is effective.

[1]  Xiao Sun,et al.  A novel feature-based method for whole genome phylogenetic analysis without alignment: application to HEV genotyping and subtyping. , 2008, Biochemical and Biophysical Research Communications - BBRC.

[2]  Renfa Li,et al.  A group of 3D graphical representation of DNA sequences based on dual nucleotides , 2008 .

[3]  Yihui Luan,et al.  Analysis of Similarity/Dissimilarity of DNA Sequences Based on Chaos Game Representation , 2013 .

[4]  Qi Dai,et al.  Study of LZ-word distribution and its application for sequence comparison , 2013, Journal of Theoretical Biology.

[5]  Xin Wang,et al.  A 3D graphical representation of protein sequences based on the Gray code. , 2012, Journal of theoretical biology.

[6]  Ali Iranmanesh,et al.  3D-Dynamic Representation of DNA Sequences , 2012 .

[7]  Haibao Tang,et al.  Insights from the comparison of plant genome sequences. , 2010, Annual review of plant biology.

[8]  Changchuan Yin,et al.  Prediction of protein coding regions by the 3-base periodicity analysis of a DNA sequence. , 2007, Journal of theoretical biology.

[9]  E. Hamori,et al.  H curves, a novel method of representation of nucleotide series especially suited for long DNA sequences. , 1983, The Journal of biological chemistry.

[10]  A. Nandy,et al.  A new graphical representation and analysis of DNA sequence structure. I: Methodology and application to globin genes , 1994 .

[11]  P. He,et al.  A novel graphical representation of proteins and its application , 2012 .

[12]  Tianming Wang,et al.  Analysis of similarity/dissimilarity of DNA sequences based on a condensed curve representation , 2005 .

[13]  Zhao-Hui Qi,et al.  New 3D graphical representation of DNA sequence based on dual nucleotides , 2007, Journal of Theoretical Biology.

[14]  Kazutaka Katoh,et al.  Multiple alignment of DNA sequences with MAFFT. , 2009, Methods in molecular biology.

[15]  Xiaolei Wang,et al.  Similarity analysis of DNA sequences based on the weighted pseudo‐entropy , 2011, J. Comput. Chem..

[16]  H. J. Jeffrey Chaos game representation of gene structure. , 1990, Nucleic acids research.

[17]  Steven S. Gorshe Generalized-and efficient techniques for the design of CMI and other encoders , 1997, IEEE Trans. Commun..

[18]  Tian-ming Wang,et al.  A representation of DNA primary sequences by random walk. , 2007, Mathematical biosciences.

[19]  Yuhua Yao,et al.  A new 2D graphical representation—Classification curve and the analysis of similarity/dissimilarity of DNA sequences , 2006 .

[20]  Ali Iranmanesh,et al.  A Novel Graphical and Numerical Representation for Analyzing DNA Sequences Based on Codons , 2012 .

[21]  R Zhang,et al.  Z curves, an intutive tool for visualizing and analyzing the DNA sequences. , 1994, Journal of biomolecular structure & dynamics.

[22]  Yu Zhang,et al.  Three‐unit semicircles curve: A compact 3D graphical representation of DNA sequences based on classifications of nucleotides , 2012 .

[23]  Milan Randic Very efficient search for nucleotide alignments , 2013, J. Comput. Chem..

[24]  Amir Niknejad,et al.  DNA sequence representation without degeneracy. , 2003, Nucleic acids research.

[25]  Bo Liao and Wen Zhu Analysis of Similarity/Dissimilarity of DNA Primary Sequences Based on Condensed Matrices and Information Entropies , 2006 .

[26]  Dejan Plavšić,et al.  Novel 2-D graphical representation of DNA sequences and their numerical characterization , 2003 .

[27]  Dominique Lavenier,et al.  Coding Region Prediction Based on a Universal DNA Sequence Representation Method , 2008, J. Comput. Biol..

[28]  Tianming Wang,et al.  PNN-curve: a new 2D graphical representation of DNA sequences and its application. , 2006, Journal of theoretical biology.

[29]  Dejan Plavšić,et al.  Analysis of similarity/dissimilarity of DNA sequences based on novel 2-D graphical representation , 2003 .

[30]  D. Bielinska-Waz Graphical and numerical representations of DNA sequences: statistical aspects of similarity , 2011, Journal of mathematical chemistry.

[31]  Ali Iranmanesh,et al.  C-curve: a novel 3D graphical representation of DNA sequence based on codons. , 2013, Mathematical biosciences.

[32]  Qi Dai,et al.  Using Markov model to improve word normalization algorithm for biological sequence comparison , 2011, Amino Acids.

[33]  M. Gupta,et al.  A NEW ADJACENT PAIR 2D GRAPHICAL REPRESENTATION OF DNA SEQUENCES , 2013 .

[34]  Qilin Xiang,et al.  A new graphical coding of DNA sequence and its similarity calculation , 2013 .

[35]  Zhao-Hui Qi,et al.  Numerical characterization of DNA sequences based on digital signal method , 2009, Comput. Biol. Medicine.

[36]  Tianming Wang,et al.  Linear regression model of short k-word: a similarity distance suitable for biological sequences with various lengths. , 2013, Journal of theoretical biology.

[37]  Bo Liao,et al.  A 2D graphical representation of DNA sequence , 2005 .

[38]  Cun-Quan Zhang,et al.  A Novel Model for DNA Sequence Similarity Analysis Based on Graph Theory , 2011, Evolutionary bioinformatics online.

[39]  Guohua Huang,et al.  H–L curve: A novel 2D graphical representation for DNA sequences , 2008 .

[40]  Guohua Huang,et al.  Alignment-free comparison of genome sequences by a new numerical characterization. , 2011, Journal of theoretical biology.

[41]  F. Deng,et al.  Exponential Stability and Numerical Methods of Stochastic Recurrent Neural Networks with Delays , 2013 .

[42]  Bo Liao,et al.  New 2D graphical representation of DNA sequences , 2004, J. Comput. Chem..

[43]  Ren Zhang,et al.  The Z curve database: a graphic representation of genome sequences , 2003, Bioinform..