Sorting Signed Permutations by Inversions in O(nlogn) Time

The study of genomic inversions (or reversals) has been a mainstay of computational genomics for nearly 20 years. After the initial breakthrough of Hannenhalli and Pevzner, who gave the first polynomial-time algorithm for sorting signed permutations by inversions, improved algorithms have been designed, culminating with an optimal linear-time algorithm for computing the inversion distance and a subquadratic algorithm for providing a shortest sequence of inversions--also known as sorting by inversions. Remaining open was the question of whether sorting by inversions could be done in O (n logn ) time. In this paper, we present a qualified answer to this question, by providing two new sorting algorithms, a simple and fast randomized algorithm and a deterministic refinement. The deterministic algorithm runs in time O (n logn + kn ), where k is a data-dependent parameter. We provide the results of extensive experiments showing that both the average and the standard deviation for k are small constants, independent of the size of the permutation. We conclude (but do not prove) that almost all signed permutations can be sorted by inversions in O (n logn ) time.

[1]  Guillaume Fertin,et al.  Combinatorics of Genome Rearrangements , 2009, Computational molecular biology.

[2]  Marie-France Sagot,et al.  Sorting by Reversals in Subquadratic Time , 2004, CPM.

[3]  David A. Bader,et al.  A Linear-Time Algorithm for Computing Inversion Distance between Signed Permutations with an Experimental Study , 2001, J. Comput. Biol..

[4]  J. Palmer,et al.  Comparison of Chloroplast and Mitochondrial Genome Evolution in Plants , 1992 .

[5]  Haim Kaplan,et al.  Efficient Data Structures and a New Randomized Approach for Sorting Signed Permutations by Reversals , 2003, CPM.

[6]  Pavel A. Pevzner,et al.  Transforming cabbage into turnip: polynomial algorithm for sorting signed permutations by reversals , 1995, JACM.

[7]  Bernard M. E. Moret,et al.  Advances in phylogeny reconstruction from gene order and content data. , 2005, Methods in enzymology.

[8]  D Sankoff,et al.  Computational complexity of inferring phylogenies from chromosome inversion data. , 1987, Journal of theoretical biology.

[9]  Jens Stoye,et al.  Reversal Distance without Hurdles and Fortresses , 2004, CPM.

[10]  Robert E. Tarjan,et al.  Self-adjusting binary search trees , 1985, JACM.

[11]  Henry D. Shapiro,et al.  An Empirical Assessment of Algorithms for Constructing a Minimum Spanning Tree , 1992, Computational Support for Discrete Mathematics.

[12]  David A. Bader,et al.  A fast linear-time algorithm for inversion distance with an experimental comparison , 2001 .

[13]  T. Dobzhansky,et al.  Inversions in the Third Chromosome of Wild Races of Drosophila Pseudoobscura, and Their Use in the Study of the History of the Species. , 1936, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Jens Stoye,et al.  Common intervals and sorting by reversals: a marriage of necessity , 2002, ECCB.

[15]  David Sankoff,et al.  Probabilistic models of genome shuffling , 1989 .

[16]  Anne Bergeron,et al.  Advances on sorting by reversals , 2007, Discret. Appl. Math..

[17]  David Sankoff,et al.  Edit Distance for Genome Comparison Based on Non-local Operations * 1 Role of Rearrangements in Evolution , .

[18]  W F Thompson,et al.  Rearrangements in the chloroplast genomes of mung bean and pea. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Alberto Caprara,et al.  Sorting by reversals is difficult , 1997, RECOMB '97.

[20]  Jens Stoye,et al.  On the Similarity of Sets of Permutations and Its Applications to Genome Comparison , 2006, J. Comput. Biol..