Sorting signed permutations by reversals, revisited

The problem of sorting signed permutations by reversals (SBR) is a fundamental problem in computational molecular biology. The goal is, given a signed permutation, to find a shortest sequence of reversals that transforms it into the positive identity permutation, where a reversal is the operation of taking a segment of the permutation, reversing it, and flipping the signs of its elements. In this paper we describe a randomized algorithm for SBR. The algorithm tries to sort the permutation by repeatedly performing a random oriented reversal. This process is in fact a random walk on the graph where permutations are the nodes and an arc from @p to @p^' corresponds to an oriented reversal that transforms @p to @p^'. We show that if this random walk stops at the identity permutation, then we have found a shortest sequence. We give empirical evidence that this process indeed succeeds with high probability on a random permutation. To implement our algorithm we describe a data structure to maintain a permutation, that allows to draw an oriented reversal uniformly at random, and perform it in sub-linear time. With this data structure we can implement the random walk in O(n^3^/^2logn) time, thus obtaining an algorithm for SBR that almost always runs in sub-quadratic time. The data structures we present may also be of independent interest for developing other algorithms for SBR, and for other problems. Finally, we present the first efficient parallel algorithm for SBR. We obtain this result by developing a fast implementation of the recent algorithm of Bergeron (Proceedings of CPM, 2001, pp. 106-117) for sorting signed permutations by reversals that is parallelizable. Our implementation runs in O(n^2logn) time on a regular RAM, and in O(nlogn) time on a PRAM using n processors.

[1]  Pavel A. Pevzner,et al.  Computational molecular biology : an algorithmic approach , 2000 .

[2]  Tzvika Hartman,et al.  On the Properties of Sequences of Reversals that Sort a Signed Permutation , 2002 .

[3]  Robert E. Tarjan,et al.  Self-adjusting binary search trees , 1985, JACM.

[4]  Tzvika Hartman,et al.  A Simpler 1.5-Approximation Algorithm for Sorting by Transpositions , 2003, CPM.

[5]  Roded Sharan,et al.  A 1.5-approximation algorithm for sorting by transpositions and transreversals , 2004, J. Comput. Syst. Sci..

[6]  Steven Fortune,et al.  Parallelism in random access machines , 1978, STOC.

[7]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[8]  Alberto Caprara,et al.  Sorting by reversals is difficult , 1997, RECOMB '97.

[9]  Paul F. Dietz Optimal Algorithms for List Indexing and Subset Rank , 1989, WADS.

[10]  Andrew Rau-Chaplin,et al.  Scalable parallel geometric algorithms for coarse grained multicomputers , 1993, SCG '93.

[11]  Anne Bergeron,et al.  A very elementary presentation of the Hannenhalli-Pevzner theory , 2005, Discret. Appl. Math..

[12]  Edward M. McCreight,et al.  Priority Search Trees , 1985, SIAM J. Comput..

[13]  Vineet Bafna,et al.  Genome Rearrangements and Sorting by Reversals , 1996, SIAM J. Comput..

[14]  Ron Shamir,et al.  Two Notes on Genome Rearrangement , 2003, J. Bioinform. Comput. Biol..

[15]  Ivan Stojmenovic,et al.  Parallel general prefix computations with geometric, algebraic, and other applications , 2005, International Journal of Parallel Programming.

[16]  David S. Johnson,et al.  Data structures for traveling salesmen , 1993, SODA '93.

[17]  Ion Stoica,et al.  Time-optimal Algorithms for Generalized Dominance Computation and Related Problems on Mesh Connected Computers and Meshes with Multiple Broadcasting , 1996, Parallel Algorithms Appl..

[18]  P. Pevzner,et al.  Human and mouse genomic sequences reveal extensive breakpoint reuse in mammalian evolution , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Richard Cole,et al.  Cascading divide-and-conquer: A technique for designing parallel algorithms , 1989, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[20]  Haim Kaplan,et al.  Efficient Data Structures and a New Randomized Approach for Sorting Signed Permutations by Reversals , 2003, CPM.

[21]  Pavel A. Pevzner,et al.  Transforming Cabbage into Turnip: Polynomial Algorithm for Sorting Signed Permutations by Reversals , 1999, J. ACM.

[22]  Viktor K. Prasanna,et al.  Constant Time Algorithms for Computational Geometry on the Reconfigurable Mesh , 1997, IEEE Trans. Parallel Distributed Syst..

[23]  Robert E. Tarjan,et al.  A Class of Algorithms which Require Nonlinear Time to Maintain Disjoint Sets , 1979, J. Comput. Syst. Sci..

[24]  Haim Kaplan,et al.  A Faster and Simpler Algorithm for Sorting Signed Permutations by Reversals , 1999, SIAM J. Comput..

[25]  Steven Skiena,et al.  Improved bounds on sorting with length-weighted reversals , 2004, SODA '04.

[26]  HannenhalliSridhar,et al.  Transforming cabbage into turnip , 1999 .

[27]  Marek Chrobak,et al.  A Data Structure Useful for Finding Hamiltonian Cycles , 1990, Theor. Comput. Sci..

[28]  Marie-France Sagot,et al.  Sorting by Reversals in Subquadratic Time , 2004, CPM.

[29]  John H. Reif,et al.  Optimal randomized parallel algorithms for computational geometry , 2005, Algorithmica.

[30]  David A. Bader,et al.  A Linear-Time Algorithm for Computing Inversion Distance between Signed Permutations with an Experimental Study , 2001, J. Comput. Biol..