A Mutual Information Based Sequence Distance For Vertebrate Phylogeny Using Complete Mitochondrial Genomes

Traditional sequence distances require alignment. A new mutual information based sequence distance without alignment is defined in this paper. This distance is based on compositional vectors of DNA sequences or protein sequences from complete genomes. First we establish the mathematical foundation of this distance. Then this distance is applied to analyze the phylogenetic relationship of 64 vertebrates using complete mitochondrial genomes. The phylogenetic tree shows that the mitochondrial genomes are separated into three major groups. One group corresponds to mammals; one group corresponds to fish; and the last one is Archosauria (including birds and reptiles). The structure of the tree based on our new distance is roughly in agreement in topology with the current known phylogenies of vertebrates.

[1]  Zu-Guo Yu,et al.  Multifractal and correlation analyses of protein sequences from complete genomes. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[2]  D D Pollock,et al.  A case for evolutionary genomics and the comprehensive examination of sequence biodiversity. , 2000, Molecular biology and evolution.

[3]  Zu-Guo Yu,et al.  Origin and phylogeny of chloroplasts revealed by a simple correlation analysis of complete genomes. , 2003, Molecular biology and evolution.

[4]  J. Leader,et al.  A comprehensive vertebrate phylogeny using vector representations of protein sequences from whole genomes. , 2002, Molecular biology and evolution.

[5]  W. Murphy,et al.  Resolution of the Early Placental Mammal Radiation Using Bayesian Phylogenetics , 2001, Science.

[6]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[7]  A. Janke,et al.  The complete mitochondrial DNA sequence of the greater Indian rhinoceros, Rhinoceros unicornis, and the Phylogenetic relationship among Carnivora, Perissodactyla, and Artiodactyla (+ Cetacea). , 1996, Molecular biology and evolution.

[8]  Zu-Guo Yu,et al.  The genomic tree of living organisms based on a fractal model , 2003 .

[9]  Xin Chen,et al.  An information-based sequence distance and its application to whole mitochondrial genome phylogeny , 2001, Bioinform..

[10]  Zu-Guo Yu,et al.  Distance, correlation and mutual information among portraits of organisms based on complete genomes , 2001 .

[11]  Alain Giron,et al.  A genomic schism in birds revealed by phylogenetic analysis of DNA strings. , 2002, Systematic biology.

[12]  C. Gissi,et al.  Where do rodents fit? Evidence from the complete mitochondrial genome of Sciurus vulgaris. , 2000, Molecular biology and evolution.

[13]  K. Chu,et al.  Phylogeny of Prokaryotes and Chloroplasts Revealed by a Simple Composition Approach on All Protein Sequences from Complete Genomes Without Sequence Alignment , 2005, Journal of Molecular Evolution.

[14]  Masami Hasegawa,et al.  Monophyletic Origin of the Order Chiroptera and Its Phylogenetic Position Among Mammalia, as Inferred from the Complete Sequence of the Mitochondrial DNA of a Japanese Megabat, the Ryukyu Flying Fox (Pteropus dasymallus) , 2000, Journal of Molecular Evolution.

[15]  R. Gray Entropy and Information Theory , 1990, Springer New York.

[16]  Steve Baker,et al.  Integrated gene and species phylogenies from unaligned whole genome protein sequences , 2002, Bioinform..

[17]  A Janke,et al.  The phylogenetic position of the Talpidae within eutheria based on analysis of complete mitochondrial sequences. , 2000, Molecular biology and evolution.

[18]  Zaher Dawy,et al.  Mutual information based distance measures for classification and content recognition with applications to genetics , 2005, IEEE International Conference on Communications, 2005. ICC 2005. 2005.

[19]  G. Pesole,et al.  Complete mitochondrial DNA sequence of the fat dormouse, Glis glis: further evidence of rodent paraphyly. , 1998, Molecular biology and evolution.

[20]  Zu-Guo Yu,et al.  Chaos game representation of protein sequences based on the detailed HP model and their multifractal and correlation analyses. , 2004, Journal of theoretical biology.

[21]  J. Qi,et al.  Whole Proteome Prokaryote Phylogeny Without Sequence Alignment: A K-String Composition Approach , 2003, Journal of Molecular Evolution.

[22]  S. Grétarsdóttir,et al.  The Mitochondrial Genome of the Sperm Whale and a New Molecular Reference for Estimating Eutherian Divergence Dates , 2000, Journal of Molecular Evolution.

[23]  Russell F. Doolittle,et al.  Microbial genomes opened up , 1998, Nature.

[24]  M Eiselt,et al.  Using mutual information to measure coupling in the cardiorespiratory system. , 1998, IEEE engineering in medicine and biology magazine : the quarterly magazine of the Engineering in Medicine & Biology Society.

[25]  S. O’Brien,et al.  Molecular phylogenetics and the origins of placental mammals , 2001, Nature.