An information-based sequence distance and its application to whole mitochondrial genome phylogeny

MOTIVATION Traditional sequence distances require an alignment and therefore are not directly applicable to the problem of whole genome phylogeny where events such as rearrangements make full length alignments impossible. We present a sequence distance that works on unaligned sequences using the information theoretical concept of Kolmogorov complexity and a program to estimate this distance. RESULTS We establish the mathematical foundations of our distance and illustrate its use by constructing a phylogeny of the Eutherian orders using complete unaligned mitochondrial genomes. This phylogeny is consistent with the commonly accepted one for the Eutherians. A second, larger mammalian dataset is also analyzed, yielding a phylogeny generally consistent with the commonly accepted one for the mammals. AVAILABILITY The program to estimate our sequence distance, is available at http://www.cs.cityu.edu.hk/~cssamk/gencomp/GenCompress1.htm. The distance matrices used to generate our phylogenies are available at http://www.math.uwaterloo.ca/~mli/distance.html.

[1]  D Sankoff,et al.  Counting on comparative maps. , 1998, Trends in genetics : TIG.

[2]  C. Gissi,et al.  The guinea-pig is not a rodent , 1996, Nature.

[3]  Stéphane Grumbach,et al.  A New Challenge for Compression Algorithms: Genetic Sequences , 1994, Inf. Process. Manag..

[4]  S. Pääbo,et al.  Conflict Among Individual Mitochondrial Proteins in Resolving the Phylogeny of Eutherian Orders , 1998, Journal of Molecular Evolution.

[5]  B. Snel,et al.  Genome phylogeny based on gene content , 1999, Nature Genetics.

[6]  D Graur,et al.  Evolutionary affinities of the order Perissodactyla and the phylogenetic status of the superordinal taxa Ungulata and Altungulata. , 1997, Molecular phylogenetics and evolution.

[7]  E V Koonin The emerging paradigm and open problems in comparative genomics. , 1999, Bioinformatics.

[8]  J. Huelsenbeck,et al.  Hobgoblin of phylogenetics? , 1994, Nature.

[9]  N Okada,et al.  Phylogenetic position of guinea pigs revisited. , 1997, Molecular biology and evolution.

[10]  Pavel A. Pevzner,et al.  Transforming cabbage into turnip: polynomial algorithm for sorting signed permutations by reversals , 1995, JACM.

[11]  S. Fitz-Gibbon,et al.  Whole genome-based phylogenetic analysis of free-living microorganisms. , 1999, Nucleic Acids Research.

[12]  Jean-Paul Delahaye,et al.  The transformation distance: A dissimilarity measure based an movements of segments , 1998, German Conference on Bioinformatics.

[13]  Susan Brown,et al.  THE GUINEA PIG , 2003 .

[14]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 1997, Texts in Computer Science.

[15]  C. Xin,et al.  A compression algorithm for DNA sequences. , 2001, IEEE engineering in medicine and biology magazine : the quarterly magazine of the Engineering in Medicine & Biology Society.

[16]  G. Pesole,et al.  Complete mitochondrial DNA sequence of the fat dormouse, Glis glis: further evidence of rodent paraphyly. , 1998, Molecular biology and evolution.

[17]  J. Felsenstein,et al.  A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. , 1994, Molecular biology and evolution.

[18]  J. Adachi,et al.  MOLPHY version 2.3 : programs for molecular phylogenetics based on maximum likelihood , 1996 .

[19]  C. Gissi,et al.  Where do rodents fit? Evidence from the complete mitochondrial genome of Sciurus vulgaris. , 2000, Molecular biology and evolution.

[20]  David Sankoff,et al.  Exact and Approximation Algorithms for the Inversion Distance Between Two Chromosomes , 1993, CPM.

[21]  Tao Jiang,et al.  A practical algorithm for recovering the best supported edges of an evolutionary tree (extended abstract) , 2000, SODA '00.

[22]  John C. Wooley Trends in computational biology (abstract) , 1999, RECOMB.

[23]  J. Boore,et al.  Big trees from little genomes: mitochondrial gene order as a phylogenetic tool. , 1998, Current Opinion in Genetics and Development.

[24]  Dan Graur,et al.  Is the guinea-pig a rodent? , 1991, Nature.

[25]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.