Relations between Shannon entropy and genome order index in segmenting DNA sequences.

Shannon entropy H and genome order index S are used in segmenting DNA sequences. Zhang [Phys. Rev. E 72, 041917 (2005)] found that the two schemes are equivalent when a DNA sequence is converted to a binary sequence of S (strong H bond) and W (weak H bond). They left the mathematical proof to mathematicians who are interested in this issue. In this paper, a possible mathematical explanation is given. Moreover, we find that Chargaff parity rule 2 is the necessary condition of the equivalence, and the equivalence disappears when a DNA sequence is regarded as a four-symbol sequence. At last, we propose that S-2(-H) may be related to species evolution.

[1]  Kresimir Josic,et al.  'Genome order index' should not be used for defining compositional constraints in nucleotide sequences , 2008, Comput. Biol. Chem..

[2]  M. Randic,et al.  On the Characterization of DNA Primary Sequences by Triplet of Nucleic Acid Bases , 2001, J. Chem. Inf. Comput. Sci..

[3]  Ren Zhang,et al.  Identification of replication origins in archaeal genomes based on the Z-curve method. , 2005, Archaea.

[4]  P Bernaola-Galván,et al.  Isochore chromosome maps of eukaryotic genomes. , 2001, Gene.

[5]  J. Lobry,et al.  A simple vectorial representation of DNA sequences for the detection of replication origins in bacteria. , 1996, Biochimie.

[6]  J. Jiménez,et al.  Correlation between strand asymmetry and phylogeny in mitochondrial DNA. , 2005, Journal of theoretical biology.

[7]  David R. Wolf,et al.  Base compositional structure of genomes. , 1992, Genomics.

[8]  W Li,et al.  Delineating relative homogeneous G+C domains in DNA sequences. , 2001, Gene.

[9]  Chun-Ting Zhang,et al.  Identification of isochore boundaries in the human genome using the technique of wavelet multiresolution analysis. , 2003, Biochemical and biophysical research communications.

[10]  S. Basak,et al.  Mathematical descriptors of DNA sequences: development and applications , 2006 .

[11]  Ren Zhang,et al.  A nucleotide composition constraint of genome sequences , 2004, Comput. Biol. Chem..

[12]  J. Mortimer,et al.  Chargaff's legacy. , 2000, Gene.

[13]  Ivo Grosse,et al.  Applications of Recursive Segmentation to the Analysis of DNA Sequences , 2002, Comput. Chem..

[14]  Jun Wang,et al.  Volatilities of codons and its application in similarity analysis of biological sequences , 2008 .

[15]  Ren Zhang,et al.  A systematic method to identify genomic islands and its applications in analyzing the genomes of Corynebacterium glutamicum and Vibrio vulnificus CMCP6 chromosome I , 2004, Bioinform..

[16]  Ren Zhang,et al.  Isochore structures in the mouse genome. , 2004, Genomics.

[17]  Feng Gao,et al.  Segmentation algorithm for DNA sequences. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[18]  Jun Wang,et al.  Characterization and similarity analysis of DNA sequences based on mutually direct-complementary triplets☆ , 2006 .

[19]  Jun Wang,et al.  Characteristic Sequences for DNA Primary Sequence , 2002, J. Chem. Inf. Comput. Sci..

[20]  Gary A. Churchill,et al.  Hidden Markov Chains and the Analysis of Genome Structure , 1992, Comput. Chem..

[21]  Yi Zhang,et al.  Characterization and similarity analysis of DNA sequences grounded on a 2-D graphical representation☆ , 2006 .