String Rearrangement Metrics: A Survey

A basic assumption in traditional pattern matching is that the order of the elements in the given input strings is correct, while the description of the content, i.e. the description of the elements, may be erroneous. Motivated by questions that arise in Text Editing, Computational Biology, Bit Torrent and Video on Demand, and Computer Architecture, a new pattern matching paradigm was recently proposed by [2]. In this model, the pattern content remains intact, but the relative positions may change. Several papers followed the initial definition of the new paradigm. Each paper revealed new aspects in the world of string rearrangement metrics. This new unified view has already proven itself by enabling the solution of an open problem of the mathematician Cayley from 1849. It also gave better insight to problems that were already studied in different and limited situations, such as the behavior of different cost functions, and enabled deriving results for cost functions that were not yet sufficiently analyzed by previous research. At this stage, a general understanding of this new model is beginning to coalesce. The aim of this survey is to present an overview of this recent new direction of research, the problems, the methodologies, and the state-of-the-art.

[1]  Mark Jerrum,et al.  The Complexity of Finding Minimum-Length Generator Sequences , 1985, Theor. Comput. Sci..

[2]  Lenwood S. Heath,et al.  Sorting by Short Swaps , 2003, J. Comput. Biol..

[3]  Yonatan Aumann,et al.  Approximate string matching with address bit errors , 2008, Theor. Comput. Sci..

[4]  Steven Skiena,et al.  Improved bounds on sorting with length-weighted reversals , 2004, SODA '04.

[5]  Robert A. Wagner,et al.  An Extension of the String-to-String Correction Problem , 1975, JACM.

[6]  Steven Skiena,et al.  Pattern matching with address errors: rearrangement distances , 2006, SODA '06.

[7]  Vineet Bafna,et al.  Sorting by Transpositions , 1998, SIAM J. Discret. Math..

[8]  Amit Kumar,et al.  Sorting and selection with structured costs , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[9]  Wojciech Rytter,et al.  Extracting Powers and Periods in a String from Its Runs Structure , 2010, SPIRE.

[10]  Lenwood S. Heath,et al.  Sorting by Bounded Block-moves , 1998, Discret. Appl. Math..

[11]  Moshe Lewenstein,et al.  Approximate Swapped Matching , 2000, FSTTCS.

[12]  M. Fischer,et al.  STRING-MATCHING AND OTHER PRODUCTS , 1974 .

[13]  Piotr Berman,et al.  Fast Sorting by Reversal , 1996, CPM.

[14]  Moshe Lewenstein,et al.  Function Matching: Algorithms, Applications, and a Lower Bound , 2003, ICALP.

[15]  Gad M. Landau,et al.  Interchange rearrangement: The element-cost model , 2008, Theor. Comput. Sci..

[16]  E. W. Ng Symbolic and Algebraic Computation , 1979, Lecture Notes in Computer Science.

[17]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[18]  Robert W. Irving,et al.  Sorting Strings by Reversals and by Transpositions , 2001, SIAM J. Discret. Math..

[19]  Robin Milner,et al.  On Observing Nondeterminism and Concurrency , 1980, ICALP.

[20]  Alberto Caprara,et al.  Sorting by reversals is difficult , 1997, RECOMB '97.

[21]  Michael Hoffmann,et al.  Algorithms - ESA 2007, 15th Annual European Symposium, Eilat, Israel, October 8-10, 2007, Proceedings , 2007, ESA.

[22]  Amihood Amir,et al.  Asynchronous Pattern Matching , 2006, CPM.

[23]  Andrew McGregor,et al.  Sorting and Selection with Random Costs , 2007, LATIN.

[24]  Ely Porat,et al.  On the Cost of Interchange Rearrangement in Strings , 2007, SIAM J. Comput..

[25]  Andrew Chi-Chih Yao,et al.  Some complexity questions related to distributive computing(Preliminary Report) , 1979, STOC.

[26]  Ron Y. Pinter,et al.  Sorting by Length-Weighted Reversals: Dealing with Signs and Circularity , 2004, CPM.

[27]  Eduardo Sany Laber,et al.  LATIN 2008: Theoretical Informatics, 8th Latin American Symposium, Búzios, Brazil, April 7-11, 2008, Proceedings , 2008, Lecture Notes in Computer Science.

[28]  Ely Porat,et al.  Approximate string matching with stuck address bits , 2010, Theor. Comput. Sci..

[29]  Arthur Cayley The Collected Mathematical Papers: Note on the Theory of Permutations , 2009 .

[30]  Richard Cole,et al.  Verifying candidate matches in sparse and wildcard matching , 2002, STOC '02.

[31]  David A. Christie,et al.  Sorting Permutations by Block-Interchanges , 1996, Inf. Process. Lett..

[32]  Jacob T. Schwartz,et al.  Fast Probabilistic Algorithms for Verification of Polynomial Identities , 1980, J. ACM.

[33]  Moshe Lewenstein,et al.  Overlap matching , 2001, SODA '01.

[34]  Piotr Indyk,et al.  Efficient computations of l1 and l∞ rearrangement distances , 2009, Theor. Comput. Sci..

[35]  Richard Zippel,et al.  Probabilistic algorithms for sparse polynomials , 1979, EUROSAM.

[36]  Arnold L. Rosenberg,et al.  Rapid identification of repeated patterns in strings, trees and arrays , 1972, STOC.

[37]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .