Improving Inversion Median Computation Using Commuting Reversals and Cycle Information

In the past decade, genome rearrangements have attracted increasing attention fromboth biologists and computer scientists as a newtype of data for phylogenetic analysis.Methods for reconstructing phylogeny fromgenome rearrangements include distance-based methods, MCMC methods and direct optimization methods. The latter, pioneered by Sankoff and extended with the software suite GRAPPA and MGR, is the most accurate approach, but is very limited due to the difficulty of its scoring procedure-it must solvemultiple instances of median problem to compute the score of a given tree. The median problem is known to be NP-hard and all existing solvers are extremely slow when the genomes are distant. In this paper, we present a new inversion median heuristic for unichromisomal genomes. The new method works by applying sets of reversals in a batch where all such reversals both commute and do not break the cycle of any other. Our testing using simulated datasets shows that this method is much faster than the leading solver for difficult datasets with only a slight accuracy penalty, yet retains better accuracy than other heuristics with comparable speed. This new method will dramatically increase the speed of current direct optimization methods and enables us to extend the range of their applicability to organellar and small nuclear genomes with more than 50 inversions along each edge. As a further improvement, this new method can very quickly produce reasonable solutions to problemswith hundreds of genes.

[1]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[2]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[3]  J. Palmer,et al.  Comparison of Chloroplast and Mitochondrial Genome Evolution in Plants , 1992 .

[4]  Jeffrey D. Palmer,et al.  Use of Chloroplast DNA Rearrangements in Reconstructing Plant Phylogeny , 1992 .

[5]  Linda A. Raubeson,et al.  Chloroplast DNA Evidence on the Ancient Evolutionary Split in Vascular Land Plants , 1992, Science.

[6]  Pavel A. Pevzner,et al.  Transforming cabbage into turnip: polynomial algorithm for sorting signed permutations by reversals , 1995, JACM.

[7]  Pavel A. Pevzner,et al.  To cut…or not to cut (applications of comparative physical maps in molecular evolution) , 1996, SODA '96.

[8]  David Sankoff,et al.  The Median Problem for Breakpoints in Comparative Genomics , 1997, COCOON.

[9]  Ron Shamir,et al.  The median problems for breakpoints are NP-complete , 1998, Electron. Colloquium Comput. Complex..

[10]  David Sankoff,et al.  Multiple Genome Rearrangement and Breakpoint Phylogeny , 1998, J. Comput. Biol..

[11]  Alberto Caprara,et al.  Sorting Permutations by Reversals and Eulerian Cycle Decompositions , 1999, SIAM J. Discret. Math..

[12]  Alberto Caprara,et al.  Formulations and hardness of multiple sorting by reversals , 1999, RECOMB.

[13]  David A. Bader,et al.  A fast linear-time algorithm for inversion distance with an experimental comparison , 2001 .

[14]  David A. Bader,et al.  A New Implmentation and Detailed Study of Breakpoint Analysis , 2000, Pacific Symposium on Biocomputing.

[15]  Bernard M. E. Moret,et al.  Finding an Optimal Inversion Median: Experimental Results , 2001, WABI.

[16]  Alberto Caprara,et al.  On the Practical Solution of the Reversal Median Problem , 2001, WABI.

[17]  David A. Bader,et al.  A Linear-Time Algorithm for Computing Inversion Distance between Signed Permutations with an Experimental Study , 2001, J. Comput. Biol..

[18]  Adam C. Siepel,et al.  Exact Algorithms for the Reversal Median Problem , 2001 .

[19]  Tao Liu,et al.  Inversion Medians Outperform Breakpoint Medians in Phylogeny Reconstruction from Gene-Order Data , 2002, WABI.

[20]  P. Pevzner,et al.  Genome-scale evolution: reconstructing gene orders in the ancestral species. , 2002, Genome research.

[21]  Tzvika Hartman,et al.  On the Properties of Sequences of Reversals that Sort a Signed Permutation , 2002 .

[22]  Adam C. Siepel An Algorithm to Enumerate Sorting Reversals for Signed Permutations , 2003, J. Comput. Biol..

[23]  P. Pevzner,et al.  Human and mouse genomic sequences reveal extensive breakpoint reuse in mammalian evolution , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Rita Casadio,et al.  Algorithms in Bioinformatics, 5th International Workshop, WABI 2005, Mallorca, Spain, October 3-6, 2005, Proceedings , 2005, WABI.

[25]  Olivier Gascuel,et al.  Mathematics of Evolution and Phylogeny , 2005 .

[26]  Bret Larget,et al.  A bayesian analysis of metazoan mitochondrial genome arrangements. , 2005, Molecular biology and evolution.

[27]  Andrés Moya,et al.  Genome Rearrangement Distances and Gene Order Phylogeny in γ-Proteobacteria , 2005 .

[28]  Matthias Bernt,et al.  Genome Rearrangement Based on Reversals that Preserve Conserved Intervals , 2006, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[29]  Marie-France Sagot,et al.  The Solution Space of Sorting by Reversals , 2007, ISBRA.

[30]  Niklas Eriksen,et al.  Reversal and transposition medians , 2007, Theor. Comput. Sci..

[31]  Matthias Bernt,et al.  Using median sets for inferring phylogenetic trees , 2007, Bioinform..

[32]  Arjun Bhutkar,et al.  Inferring genome-scale rearrangement phylogeny and ancestral gene order: a Drosophila case study , 2007, Genome Biology.

[33]  Jijun Tang,et al.  Reconstructing phylogenies from gene-content and gene-order data , 2007, Mathematics of Evolution and Phylogeny.