Algorithms on Compressed Strings and Arrays

We survey the complexity issues related to several algorithmic problems for compressed one- and two-dimensional texts without explicit decompression: pattern-matching, equality-testing, computation of regularities, subsegment extraction, language membership, and solvability of word equations. Our basic problem is one- and two-dimensional pattern-matching together with its variations. For some types of compression the pattern-matching problems are unfeasible (NP-hard), for other types they are solvable in polynomial time and we discuss how to reduce the degree of corresponding polynomials.

[1]  Wojciech Plandowski,et al.  The Compression of Subsegments of Images Described by Finite Automata , 1999, CPM.

[2]  Wojciech Rytter,et al.  Almost-optimal fully LZW-compressed pattern matching , 1999, Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096).

[3]  Ayumi Shinohara,et al.  An Improved Pattern Matching Algorithm for Strings in Terms of Straight-Line Programs , 1997, CPM.

[4]  Udi Manber A text compression scheme that allows fast searching directly in the compressed file , 1997, TOIS.

[5]  Antonella Cresti,et al.  Pattern matching in text compressed with the ID heuristic , 1998, Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225).

[6]  A. Restivo,et al.  Text Compression Using Antidictionaries , 1999, ICALP.

[7]  Wojciech Plandowski,et al.  Testing Equivalence of Morphisms on Context-Free Languages , 1994, ESA.

[8]  Oscar H. Ibarra,et al.  Probabilistic Algorithms for Deciding Equivalence of Straight-Line Programs , 1983, JACM.

[9]  Michel Latteux,et al.  On Continuous Functions Computed by Finite Automata , 1994, RAIRO Theor. Informatics Appl..

[10]  Ricardo A. Baeza-Yates,et al.  Fast searching on compressed text allowing errors , 1998, SIGIR '98.

[11]  Wojciech Plandowski,et al.  Efficient Algorithms for Lempel-Zip Encoding (Extended Abstract) , 1996, SWAT.

[12]  Ming Gu,et al.  An efficient algorithm for dynamic text indexing , 1994, SODA '94.

[13]  Gonzalo Navarro,et al.  A General Practical Approach to Pattern Matching over Ziv-Lempel Compressed Text , 1999, CPM.

[14]  Samuel Eilenberg,et al.  Automata, languages, and machines. A , 1974, Pure and applied mathematics.

[15]  Wojciech Plandowski,et al.  Randomized Efficient Algorithms for Compressed Strings: The Finger-Print Approach (Extended Abstract) , 1996, CPM.

[16]  Wojciech Plandowski,et al.  On the Complexity of Pattern Matching for Highly Compressed Two-Dimensional Texts , 1997, CPM.

[17]  Ayumi Shinohara,et al.  Pattern Matching in Text Compressed by Using Antidictionaries , 1999, CPM.

[18]  Richard Zippel,et al.  Probabilistic algorithms for sparse polynomials , 1979, EUROSAM.

[19]  Gary Benson,et al.  Optimal Two-Dimensional Compressed Matching , 1997, J. Algorithms.

[20]  Dana Angluin,et al.  Finding Patterns Common to a Set of Strings , 1980, J. Comput. Syst. Sci..

[21]  Ayumi Shinohara,et al.  Shift-And Approach to Pattern Matching in LZW Compressed Text , 1999, CPM.

[22]  Wojciech Rytter,et al.  An Efficient Pattern-Matching Algorithm for Strings with Short Descriptions , 1997, Nord. J. Comput..

[23]  Gary Benson,et al.  Let sleeping files lie: pattern matching in Z-compressed files , 1994, SODA '94.

[24]  Jacob T. Schwartz,et al.  Fast Probabilistic Algorithms for Verification of Polynomial Identities , 1980, J. ACM.

[25]  Wojciech Rytter,et al.  Efficiency of Fast Parallel Pattern Searching in Highly Compressed Texts , 1999, MFCS.

[26]  Wojciech Plandowski,et al.  Efficient algorithms for Lempel-Ziv encoding , 1996 .

[27]  Rajeev Motwani,et al.  Randomized Algorithms , 1995, SIGA.

[28]  Y. Fisher Fractal image compression: theory and application , 1995 .

[29]  Gary Benson,et al.  Efficient two-dimensional compressed matching , 1992, Data Compression Conference, 1992..

[30]  M. Lothaire Combinatorics on words: Bibliography , 1997 .

[31]  Alfred V. Aho,et al.  The Design and Analysis of Computer Algorithms , 1974 .

[32]  Abraham Lempel,et al.  On the Complexity of Finite Sequences , 1976, IEEE Trans. Inf. Theory.

[33]  Ayumi Shinohara,et al.  Multiple pattern matching in LZW compressed text , 1998, Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225).

[34]  Mikkel Thorup,et al.  String matching in Lempel-Ziv compressed strings , 1995, STOC '95.

[35]  James A. Storer,et al.  Data Compression: Methods and Theory , 1987 .

[36]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[37]  Sergio De Agostino P-complete Problems in Data Compression , 1994, Theor. Comput. Sci..

[38]  Ricardo A. Baeza-Yates,et al.  Direct pattern matching on compressed text , 1998, Proceedings. String Processing and Information Retrieval: A South American Symposium (Cat. No.98EX207).

[39]  Raymond E. Miller,et al.  Complexity of Computer Computations , 1972 .

[40]  Michael F. Barnsley,et al.  Chapter V – Fractal Dimension , 1993 .

[41]  Wojciech Rytter,et al.  Text Algorithms , 1994 .

[42]  Wojciech Plandowski,et al.  Application of Lempel-Ziv Encodings to the Solution of Words Equations , 1998, ICALP.

[43]  Wojciech Plandowski,et al.  The Expressibility of Languages and Relations by Word Equations , 1997, ICALP.

[44]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[45]  Wojciech Plandowski,et al.  Satisfiability of word equations with constants is in PSPACE , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[46]  Jarkko Kari,et al.  Image compression using weighted finite automata , 1993, Comput. Graph..

[47]  Lyman P. Hurd,et al.  Fractal image compression , 1993 .

[48]  Wojciech Rytter,et al.  Efficient parallel algorithms , 1988 .

[49]  Wojciech Rytter,et al.  Efficient Parallel Algorithms to Test Square-Freeness and Factorize Strings , 1991, Inf. Process. Lett..

[50]  G. Makanin The Problem of Solvability of Equations in a Free Semigroup , 1977 .

[51]  Wojciech Plandowski,et al.  Pattern matching for images generated by finite automata , 2000 .

[52]  Wojciech Plandowski Satisfiability of word equations with constants is in NEXPTIME , 1999, STOC '99.

[53]  S. Muthukrishnan,et al.  Optimal parallel dictionary matching and compression (extended abstract) , 1995, SPAA '95.

[54]  Jarkko Kari,et al.  Arithmetic Coding of Weighted Finite Automata , 1994, RAIRO Theor. Informatics Appl..

[55]  Karel Culik,et al.  Finite Automata Computing Real Functions , 1994, SIAM J. Comput..

[56]  Richard M. Karp,et al.  Efficient Randomized Pattern-Matching Algorithms , 1987, IBM J. Res. Dev..