A fully compressed pattern matching algorithm for simple collage systems

We study the fully compressed pattern matching problem (FCPM problem): Given ${\mathcal T}$ and ${\mathcal P}$ which are descriptions of text T and pattern P respectively, find the occurrences of P in Twithout decompressing${\mathcal T}$or${\mathcal P}$. This problem is rather challenging since patterns are also given in a compressed form. In this paper we present an FCPM algorithm for simple collage systems. Collage systems are a general framework representing various kinds of dictionary-based compressions in a uniform way, and simple collage systems are a subclass that includes LZW and LZ78 compressions. Collage systems are of the form $\langle {\mathcal D}, {\mathcal S} \rangle$, where ${\mathcal D}$ is a dictionary and ${\mathcal S}$ is a sequence of variables from ${\mathcal D}$. Our FCPM algorithm performs in $O(\|{\mathcal D}\|^2 + mn \log |{\mathcal S}|)$ time, where $n = |{\mathcal T}| = \|{\mathcal D}\| + |{\mathcal S}|$ and $m = |{\mathcal P}|$. This is faster than the previous best result of O...

[1]  Udi Manber A text compression scheme that allows fast searching directly in the compressed file , 1997, TOIS.

[2]  Ayumi Shinohara,et al.  Collage system: a unifying framework for compressed pattern matching , 2003, Theor. Comput. Sci..

[3]  Wojciech Rytter,et al.  Almost-optimal fully LZW-compressed pattern matching , 1999, Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096).

[4]  Uzi Vishkin,et al.  Matching Patterns in Strings Subject to Multi-Linear Transformations , 1988, Theor. Comput. Sci..

[5]  Ayumi Shinohara,et al.  An Efficient Pattern Matching Algorithm on a Subclass of Context Free Grammars , 2004, Developments in Language Theory.

[6]  Wojciech Rytter,et al.  An Efficient Pattern-Matching Algorithm for Strings with Short Descriptions , 1997, Nord. J. Comput..

[7]  Mikkel Thorup,et al.  String Matching in Lempel—Ziv Compressed Strings , 1998, Algorithmica.

[8]  Pamela C. Cosman,et al.  Universal lossless compression via multilevel pattern matching , 2000, IEEE Trans. Inf. Theory.

[9]  A. Moffat,et al.  Offline dictionary-based compression , 2000, Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096).

[10]  Ayumi Shinohara,et al.  Compressed pattern matching for SEQUITUR , 2001, Proceedings DCC 2001. Data Compression Conference.

[11]  James A. Storer,et al.  Data compression via textual substitution , 1982, JACM.

[12]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[13]  Ayumi Shinohara,et al.  Speeding Up Pattern Matching by Text Compression , 2000, CIAC.

[14]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[15]  G SzymanskiThomas,et al.  Data compression via textual substitution , 1982 .

[16]  Terry A. Welch,et al.  A Technique for High-Performance Data Compression , 1984, Computer.

[17]  John C. Kieffer,et al.  Structured grammar-based codes for universal lossless data compression , 2002, Commun. Inf. Syst..

[18]  Wojciech Plandowski,et al.  Efficient Algorithms for Lempel-Zip Encoding (Extended Abstract) , 1996, SWAT.

[19]  En-Hui Yang,et al.  Grammar-based codes: A new class of universal lossless source codes , 2000, IEEE Trans. Inf. Theory.

[20]  Ayumi Shinohara,et al.  An Improved Pattern Matching Algorithm for Strings in Terms of Straight-Line Programs , 1997, CPM.

[21]  Ayumi Shinohara,et al.  Bit-parallel approach to approximate string matching in compressed texts , 2000, Proceedings Seventh International Symposium on String Processing and Information Retrieval. SPIRE 2000.

[22]  Philip Gage,et al.  A new algorithm for data compression , 1994 .

[23]  Gary Benson,et al.  Let sleeping files lie: pattern matching in Z-compressed files , 1994, SODA '94.

[24]  Wojciech Rytter Algorithms on Compressed Strings and Arrays , 1999, SOFSEM.

[25]  Ian H. Witten,et al.  Identifying Hierarchical Structure in Sequences: A linear-time algorithm , 1997, J. Artif. Intell. Res..

[26]  Gary Benson,et al.  Efficient two-dimensional compressed matching , 1992, Data Compression Conference, 1992..