Shannon Entropy Analysis of the Genome Code

This paper studies the chromosome information of twenty five species, namely, mammals, fishes, birds, insects, nematodes, fungus, and one plant. A quantifying scheme inspired in the state space representation of dynamical systems is formulated. Based on this algorithm, the information of each chromosome is converted into a bidimensional distribution. The plots are then analyzed and characterized by means of Shannon entropy. The large volume of information is integrated by averaging the lengths and entropy quantities of each species. The results can be easily visualized revealing quantitative global genomic information.

[1]  A. Nandy,et al.  Novel techniques of graphical representation and analysis of DNA sequences—A review , 1998, Journal of Biosciences.

[2]  Matthias Platzer,et al.  Mapping human genetic ancestry. , 2007, Molecular biology and evolution.

[3]  Tolga Can,et al.  Bi-k-bi clustering: mining large scale gene expression data using two-level biclustering , 2010, Int. J. Data Min. Bioinform..

[4]  Christian Beck,et al.  Generalised information and entropy measures in physics , 2009, 0902.1235.

[5]  Se-Ran Jun,et al.  Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions , 2009, Proceedings of the National Academy of Sciences.

[6]  Webb Miller,et al.  Using genomic data to unravel the root of the placental mammal phylogeny. , 2007, Genome research.

[7]  Manfred Schroeder,et al.  Fractals, Chaos, Power Laws: Minutes From an Infinite Paradise , 1992 .

[8]  A. Bolshoy,et al.  statistical analysis of exon lengths in various eukaryotes , 2011 .

[9]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[10]  R. Gray Entropy and Information Theory , 1990, Springer New York.

[11]  Ambuj K. Singh,et al.  RRW: repeated random walks on genome-scale protein networks for local cluster discovery , 2009, BMC Bioinformatics.

[12]  Aleksandr Yakovlevich Khinchin,et al.  Mathematical foundations of information theory , 1959 .

[13]  P. Deschavanne,et al.  Genomic signature: characterization and classification of species assessed by chaos game representation of sequences. , 1999, Molecular biology and evolution.

[14]  J. A. Tenreiro Machado,et al.  Accessing complexity from genome information , 2012 .

[15]  José António Tenreiro Machado,et al.  Entropy analysis of the DNA code dynamics in human chromosomes , 2011, Comput. Math. Appl..

[16]  Eric D. Green,et al.  Confirming the Phylogeny of Mammals by Use of Large Comparative Sequence Data Sets , 2008, Molecular biology and evolution.

[17]  J. A. Tenreiro Machado,et al.  Entropy Analysis of Integer and Fractional Dynamical Systems , 2010 .

[18]  Zhandong Liu,et al.  Sequence space coverage, entropy of genomes and the potential to detect non-human DNA in human samples , 2008, BMC Genomics.

[19]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[20]  Tolga Can,et al.  Using network context as a filter for miRNA target prediction , 2011, Biosyst..

[21]  Alexander Bolshoy,et al.  Revisiting the relationship between compositional sequence complexity and periodicity , 2008, Comput. Biol. Chem..

[22]  Zeev Volkovich,et al.  Prokaryote clustering based on DNA curvature distributions , 2009, Discret. Appl. Math..

[23]  Michel L. Lapidus,et al.  Tambour fractal: vers une résolution de la conjecture de Weyl-Berry pour les valeurs propres du laplacien , 1988 .