Property and Equivalence Testing on Strings

We investigate property testing and related questions, where instead of the usual Hamming and edit distances between input strings, we consider the more relaxed edit distance with moves. Using a statistical embedding of words which has similarities with the Parikh mapping, we first construct a tolerant tester for the equality of two words, whose complexity is independent of the string size, and we derive an approximation algorithm for the normalized edit distance with moves. We then consider the question of testing if a string is a member of a given language. We develop a method to compute, in polynomial time in the representation, a geometric approximate description of a regular language by a finite union of polytopes. As an application, we have a new tester for regular languages given by their nondeterministic finite automaton (or regular expressions), whose complexity does not depend on the automaton, except for a polynomial time preprocessing step. Furthermore, this method allows us to compare languages and validates the new notion of equivalent testing that we introduce. Using the geometrical embedding we can distinguish between a pair of automata that compute the same language, and a pair of automata whose languages are not "-equivalent in an appropriate sense. Our equivalence tester is deterministic and has polynomial time complexity, whereas the non-approximated version is PSPACE-complete. Last, we extend the geometric embedding, and hence the tester algorithms, to infinite regular languages and to context-free grammars as well. For context-free grammars the equivalence test has now exponential time complexity, but in comparison, the non-approximated version is not even recursively decidable.

[1]  R. Varga,et al.  Proof of Theorem 4 , 1983 .

[2]  Andrei Z. Broder,et al.  On the resemblance and containment of documents , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).

[3]  Ronitt Rubinfeld,et al.  Testing that distributions are close , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[4]  Robert Krauthgamer,et al.  Approximating edit distance efficiently , 2004, 45th Annual IEEE Symposium on Foundations of Computer Science.

[5]  J. Berstel,et al.  Context-free languages , 1993, SIGA.

[6]  C. SIAMJ. TESTING MEMBERSHIP IN LANGUAGES THAT HAVE SMALL WIDTH BRANCHING PROGRAMS , 2002 .

[7]  Albert R. Meyer,et al.  The Equivalence Problem for Regular Expressions with Squaring Requires Exponential Space , 1972, SWAT.

[8]  Mike Paterson,et al.  A Faster Algorithm Computing String Edit Distances , 1980, J. Comput. Syst. Sci..

[9]  Graham Cormode,et al.  Sequence distance embeddings , 2003 .

[10]  Ronitt Rubinfeld,et al.  Robust Characterizations of Polynomials with Applications to Program Testing , 1996, SIAM J. Comput..

[11]  Dana Shapira,et al.  Large Edit Distance with Multiple Block Operations , 2003, SPIRE.

[12]  Manuel Blum,et al.  Designing programs that check their work , 1989, STOC '89.

[13]  Ronitt Rubinfeld,et al.  A sublinear algorithm for weakly approximating edit distance , 2003, STOC '03.

[14]  Manuel Blum,et al.  Self-testing/correcting with applications to numerical problems , 1990, STOC '90.

[15]  Noga Alon,et al.  Regular languages are testable with a constant number of queries , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[16]  GoldreichOded,et al.  Property testing and its connection to learning and approximation , 1998 .

[17]  Dana Shapira,et al.  Edit distance with move operations , 2002, J. Discrete Algorithms.

[18]  Dana Ron,et al.  Property testing and its connection to learning and approximation , 1998, JACM.

[19]  Ronitt Rubinfeld On the Robustness of Functional Equations , 1999, SIAM J. Comput..

[20]  Noga Alon,et al.  Regular Languages are Testable with a Constant Number of Queries , 2000, SIAM J. Comput..

[21]  Funda Ergün,et al.  Comparing Sequences with Segment Rearrangements , 2003, FSTTCS.

[22]  Ronitt Rubinfeld,et al.  Tolerant property testing and distance approximation , 2006, J. Comput. Syst. Sci..

[23]  Norbert Blum,et al.  Greibach Normal Form Transformation Revisited , 1999, Inf. Comput..