Solving Classical String Problems an Compressed Texts

Here we study the complexity of string problems as a function of the size of a program that generates input. We consider straight-line programs (SLP), since all algorithms on SLP-generated strings could be applied to processing LZ-compressed texts. The main result is a new algorithm for pattern matching when both a text T and a pattern P are presented by SLPs (so-called fully compressed pattern matching problem). We show how to nd a rst occurrence, count all occurrences, check whether any given position is an occurrence or not in time O(n 2 m). Here m; n are the sizes of straight-line programs generating correspondingly P and T . Then we present polynomial algorithms for computing ngerprint table and compressed representation of all covers (for the rst time) and for nding periods of a given compressed string (our algorithm is faster than previously known). On the other hand, we show that computing the Hamming distance between two SLP-generated strings is NP- and coNP-hard.

[1]  W. Rytter Compressed and fully compressed pattern matching in one and two dimensions , 2000, Proceedings of the IEEE.

[2]  PlandowskiWojciech Satisfiability of word equations with constants is in PSPACE , 2004 .

[3]  Ayumi Shinohara,et al.  Fully compressed pattern matching algorithm for balanced straight-line programs , 2000, Proceedings Seventh International Symposium on String Processing and Information Retrieval. SPIRE 2000.

[4]  RytterWojciech Application of Lempel--Ziv factorization to the approximation of grammar-based compression , 2003 .

[5]  Ayumi Shinohara,et al.  Collage system: a unifying framework for compressed pattern matching , 2003, Theor. Comput. Sci..

[6]  Wojciech Rytter,et al.  Grammar Compression, LZ-Encodings, and String Algorithms with Implicit Input , 2004, ICALP.

[7]  Wojciech Plandowski,et al.  On the Complexity of Pattern Matching for Highly Compressed Two-Dimensional Texts , 2002, J. Comput. Syst. Sci..

[8]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[9]  Markus Lohrey,et al.  Word Problems on Compressed Words , 2004, ICALP.

[10]  Wojciech Plandowski Satisfiability of word equations with constants is in PSPACE , 2004, JACM.

[11]  Costas S. Iliopoulos,et al.  Optimal Superprimitivity Testing for Strings , 1991, Inf. Process. Lett..

[12]  Gonzalo Navarro,et al.  Approximate String Matching over Ziv-Lempel Compressed Text , 2000, CPM.

[13]  Wojciech Plandowski,et al.  Efficient Algorithms for Lempel-Zip Encoding (Extended Abstract) , 1996, SWAT.

[14]  Gonzalo Navarro,et al.  A General Practical Approach to Pattern Matching over Ziv-Lempel Compressed Text , 1999, CPM.

[15]  S. Muthukrishnan,et al.  Detecting False Matches in String-Matching Algorithms , 1997, Algorithmica.

[16]  Anca Muscholl,et al.  Pattern Matching and Membership for Hierarchical Message Sequence Charts , 2002, LATIN.

[17]  Wojciech Rytter Application of Lempel-Ziv factorization to the approximation of grammar-based compression , 2003, Theor. Comput. Sci..

[18]  Markus Lohrey,et al.  Querying and Embedding Compressed Texts , 2006, MFCS.

[19]  Giorgio Satta,et al.  Efficient text fingerprinting via Parikh mapping , 2003, J. Discrete Algorithms.

[20]  Ayumi Shinohara,et al.  An Improved Pattern Matching Algorithm for Strings in Terms of Straight-Line Programs , 1997, CPM.

[21]  Yury Lifshits,et al.  Window Subsequence Problems for Compressed Texts , 2006, CSR.

[22]  Wojciech Rytter,et al.  Application of Lempel-Ziv factorization to the approximation of grammar-based compression , 2002, Theor. Comput. Sci..

[23]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[24]  Gary Benson,et al.  Let sleeping files lie: pattern matching in Z-compressed files , 1994, SODA '94.

[25]  Anca Muscholl,et al.  Pattern Matching and Membership for Hierarchical Message Sequence Charts , 2002, Theory of Computing Systems.

[26]  Mikkel Thorup,et al.  String Matching in Lempel—Ziv Compressed Strings , 1998, Algorithmica.

[27]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[28]  Gonzalo Navarro,et al.  Regular expression searching on compressed text , 2003, J. Discrete Algorithms.