Cycle Detection and Correction

Assume that a natural cyclic phenomenon has been measured, but the data is corrupted by errors. The type of corruption is application-dependent and may be caused by measurements errors, or natural features of the phenomenon. This paper studies the problem of recovering the correct cycle from data corrupted by various error models, formally defined as the period recovery problem. Specifically, we define a metric property which we call pseudo-locality and study the period recovery problem under pseudo-local metrics. Examples of pseudo-local metrics are the Hamming distance, the swap distance, and the interchange (or Cayley) distance. We show that for pseudo-local metrics, periodicity is a powerful property allowing detecting the original cycle and correcting the data, under suitable conditions. Some surprising features of our algorithm are that we can efficiently identify the period in the corrupted data, up to a number of possibilities logarithmic in the length of the data string, even for metrics whose calculation is NP-hard. For the Hamming metric we can reconstruct the corrupted data in near linear time even for unbounded alphabets. This result is achieved using the property of separation in the self-convolution vector and Reed-Solomon codes. Finally, we employ our techniques beyond the scope of pseudo-local metrics and give a recovery algorithm for the non pseudo-local Levenshtein edit metric.

[1]  Zvi Galil,et al.  Alphabet-Independent Two-Dimensional Witness Computation , 1996, SIAM J. Comput..

[2]  Piotr Berman,et al.  Fast Sorting by Reversal , 1996, CPM.

[3]  Karl R. Abrahamson Generalized String Matching , 1987, SIAM J. Comput..

[4]  Arthur Cayley The Collected Mathematical Papers: Note on the Theory of Permutations , 2009 .

[5]  Raffaele Giancarlo,et al.  Periodicity and repetitions in parameterized strings , 2008, Discret. Appl. Math..

[6]  Alexander Tiskin,et al.  Fast Distance Multiplication of Unit-Monge Matrices , 2010, SODA '10.

[7]  David A. Christie,et al.  Sorting Permutations by Block-Interchanges , 1996, Inf. Process. Lett..

[8]  Gad M. Landau,et al.  Pattern Matching with Swaps , 2000, J. Algorithms.

[9]  Maxime Crochemore,et al.  An Optimal Algorithm for Computing the Repetitions in a Word , 1981, Inf. Process. Lett..

[10]  Ely Porat,et al.  On the Cost of Interchange Rearrangement in Strings , 2007, SIAM J. Comput..

[11]  Mireille Régnier,et al.  A Unifying Look at d-Dimensional Periodicities and Space Coverings , 1993, CPM.

[12]  I. Reed,et al.  Polynomial Codes Over Certain Finite Fields , 1960 .

[13]  Steven Skiena,et al.  Pattern matching with address errors: rearrangement distances , 2006, SODA 2006.

[14]  M. Lothaire,et al.  Combinatorics on words: Frontmatter , 1997 .

[15]  M. Fischer,et al.  STRING-MATCHING AND OTHER PRODUCTS , 1974 .

[16]  Vineet Bafna,et al.  Sorting by Transpositions , 1998, SIAM J. Discret. Math..

[17]  Franco P. Preparata,et al.  Data structures and algorithms for the string statistics problem , 1996, Algorithmica.

[18]  Gary Benson,et al.  Two-Dimensional Periodicity in Rectangular Arrays , 1998, SIAM J. Comput..

[19]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .