Compressed Automata for Dictionary Matching

A variant of the dictionary matching problem is addressed where the dictionary is given in an SLP-compressed form. An Aho-Corasick automata-based algorithm is presented which pre-processes the compressed dictionary $\mathcal{D}$ in O(n4logn) time using O(n2logN) space and recognizes all occurrences of the patterns in $\mathcal{D}$ in amortized O(h+m) running time per character, where n and N are, respectively, the compressed and uncompressed sizes of $\mathcal{D}$, and h is the height of $\mathcal{D}$, and m is the number of patterns in the dictionary.

[1]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[2]  Craig G. Nevill-Manning,et al.  Compression by induction of hierarchical grammars , 1994, Proceedings of IEEE Data Compression Conference (DCC'94).

[3]  Peter Weiner,et al.  Linear Pattern Matching Algorithms , 1973, SWAT.

[4]  Wojciech Plandowski,et al.  Efficient Algorithms for Lempel-Zip Encoding (Extended Abstract) , 1996, SWAT.

[5]  Wojciech Rytter Application of Lempel-Ziv factorization to the approximation of grammar-based compression , 2003, Theor. Comput. Sci..

[6]  Ayumi Shinohara,et al.  Collage system: a unifying framework for compressed pattern matching , 2003, Theor. Comput. Sci..

[7]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[8]  Wojciech Rytter,et al.  Text Algorithms , 1994 .

[9]  Ayumi Shinohara,et al.  An Improved Pattern Matching Algorithm for Strings in Terms of Straight-Line Programs , 1997, CPM.

[10]  Alistair Moffat,et al.  Off-line dictionary-based compression , 1999, Proceedings of the IEEE.

[11]  Djamal Belazzougui Succinct Dictionary Matching with No Slowdown , 2010, CPM.

[12]  Wojciech Rytter,et al.  An Efficient Pattern-Matching Algorithm for Strings with Short Descriptions , 1997, Nord. J. Comput..

[13]  Wojciech Plandowski,et al.  Efficient algorithms for Lempel-Ziv encoding , 1996 .

[14]  James A. Storer,et al.  Data compression via textual substitution , 1982, JACM.

[15]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[16]  Gad M. Landau,et al.  Random access to grammar-compressed strings , 2010, SODA '11.

[17]  Terry A. Welch,et al.  A Technique for High-Performance Data Compression , 1984, Computer.