An Evolutionary Distance Based on Maximal Unique Matches

We propose to compare complete genomes using a new evolutionary distance based on maximum unique matches (MUMs). We first evaluate the minimum length for a MUM to be significant, and significant MUMs shared by two genomes are then searched with a linear time algorithm. Then, by simulating sets of sequences evolving according to a given tree topology, we prove that this distance varies monotonically with the amount of evolutionary events and, using the NJ method, we get phylogenetic trees very close to the initial ones. Finally, we apply this very fast method to compare bacterial genomes among the Gammaproteobacteria family.

[1]  David A. Bader,et al.  A Linear-Time Algorithm for Computing Inversion Distance between Signed Permutations with an Experimental Study , 2001, J. Comput. Biol..

[2]  J. Leader,et al.  A comprehensive vertebrate phylogeny using vector representations of protein sequences from whole genomes. , 2002, Molecular biology and evolution.

[3]  Pavel A. Pevzner,et al.  Transforming cabbage into turnip: polynomial algorithm for sorting signed permutations by reversals , 1995, JACM.

[4]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[5]  Anne Bergeron A very elementary presentation of the Hannenhalli-Pevzner theory , 2005, Discret. Appl. Math..

[6]  S. Salzberg,et al.  Alignment of whole genomes. , 1999, Nucleic acids research.

[7]  L. Gordon,et al.  Poisson Approximation and the Chen-Stein Method , 1990 .

[8]  Michael W. Berry,et al.  A Comprehensive Whole Genome Bacterial Phylogeny Using Correlated Peptide Motifs Defined in a High Dimensional Vector Space , 2003, J. Bioinform. Comput. Biol..

[9]  Daniel H. Huson,et al.  Phylogenetic trees based on gene content , 2004, Bioinform..

[10]  Alberto Caprara,et al.  Experimental and Statistical Analysis of Sorting by Reversals , 2000 .

[11]  Xin Chen,et al.  An information-based sequence distance and its application to whole mitochondrial genome phylogeny , 2001, Bioinform..

[12]  Fred R. McMorris,et al.  COMPARISON OF UNDIRECTED PHYLOGENETIC TREES BASED ON SUBTREES OF FOUR EVOLUTIONARY UNITS , 1985 .

[13]  David A. Bader,et al.  A Linear-Time Algorithm for Computing Inversion Distance between Signed Permutations with an Experimental Study , 2001, WADS.

[14]  Khalid Sayood,et al.  A new sequence distance measure for phylogenetic tree construction , 2003, Bioinform..

[15]  Sophie Schbath,et al.  Compound Poisson approximation of word counts in DNA sequences , 1997 .

[16]  Alain Giron,et al.  Exploration of phylogenetic data using a global sequence analysis method , 2005, BMC Evolutionary Biology.

[17]  Daniel H. Huson,et al.  Whole-genome prokaryotic phylogeny , 2005, Bioinform..

[18]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[19]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[20]  Peter Weiner,et al.  Linear Pattern Matching Algorithms , 1973, SWAT.

[21]  B. Snel,et al.  Genome phylogeny based on gene content , 1999, Nature Genetics.

[22]  Pavel A. Pevzner,et al.  Transforming Cabbage into Turnip: Polynomial Algorithm for Sorting Signed Permutations by Reversals , 1999, J. ACM.

[23]  David Sankoff,et al.  Edit Distances for Genome Comparisons Based on Non-Local Operations , 1992, CPM.

[24]  Zhiyong Lu,et al.  Database resources of the National Center for Biotechnology Information , 2010, Nucleic Acids Res..

[25]  Esko Ukkonen,et al.  Constructing Suffix Trees On-Line in Linear Time , 1992, IFIP Congress.

[26]  Edward M. McCreight,et al.  A Space-Economical Suffix Tree Construction Algorithm , 1976, JACM.

[27]  David Burstein,et al.  The Average Common Substring Approach to Phylogenomic Reconstruction , 2006, J. Comput. Biol..

[28]  J. Qi,et al.  Whole Proteome Prokaryote Phylogeny Without Sequence Alignment: A K-String Composition Approach , 2003, Journal of Molecular Evolution.

[29]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[30]  Anne Bergeron,et al.  A very elementary presentation of the Hannenhalli-Pevzner theory , 2005, Discret. Appl. Math..