A unifying framework for compressed pattern matching

We introduce a general framework which is suitable to capture an essence of compressed pattern matching according to various dictionary based compressions, and propose a compressed pattern matching algorithm for the framework. The goal is to find all occurrences of a pattern in a text without decompression, which is one of the most active topics in string matching. Our framework includes such compression methods as Lempel-Ziv family, (LZ77, LZSS, LZ78, LZW) (J. Ziv and A. Lempel, 1978), byte-pair encoding, and the static dictionary based method. Technically, our pattern matching algorithm extends that for LZW compressed text presented by A. Amir et al. (1996).

[1]  Ayumi Shinohara,et al.  An Improved Pattern Matching Algorithm for Strings in Terms of Straight-Line Programs , 1997, CPM.

[2]  Z. Galil,et al.  Pattern matching algorithms , 1997 .

[3]  Ricardo A. Baeza-Yates,et al.  Direct pattern matching on compressed text , 1998, Proceedings. String Processing and Information Retrieval: A South American Symposium (Cat. No.98EX207).

[4]  S. Arikawa,et al.  Byte Pair Encoding: a Text Compression Scheme That Accelerates Pattern Matching , 1999 .

[5]  Wojciech Rytter,et al.  An Efficient Pattern-Matching Algorithm for Strings with Short Descriptions , 1997, Nord. J. Comput..

[6]  Ayumi Shinohara,et al.  Pattern Matching in Text Compressed by Using Antidictionaries , 1999, CPM.

[7]  Wojciech Plandowski,et al.  Efficient algorithms for Lempel-Ziv encoding , 1996 .

[8]  James A. Storer,et al.  Data compression via textual substitution , 1982, JACM.

[9]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[10]  S. Rao Kosaraju Pattern Matching in Compressed Texts , 1995, FSTTCS.

[11]  Terry A. Welch,et al.  A Technique for High-Performance Data Compression , 1984, Computer.

[12]  Philip Gage,et al.  A new algorithm for data compression , 1994 .

[13]  Ayumi Shinohara,et al.  Shift-And Approach to Pattern Matching in LZW Compressed Text , 1999, CPM.

[14]  Gary Benson,et al.  Let sleeping files lie: pattern matching in Z-compressed files , 1994, SODA '94.

[15]  Lucas Chi Kwong Hui,et al.  Color Set Size Problem with Application to String Matching , 1992, CPM.

[16]  Wojciech Rytter,et al.  Text Algorithms , 1994 .

[17]  Ricardo A. Baeza-Yates,et al.  Fast searching on compressed text allowing errors , 1998, SIGIR '98.

[18]  Gonzalo Navarro,et al.  A General Practical Approach to Pattern Matching over Ziv-Lempel Compressed Text , 1999, CPM.

[19]  Mark Nelson,et al.  The Data Compression Book , 2009 .

[20]  Ayumi Shinohara,et al.  Multiple pattern matching in LZW compressed text , 1998, Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225).