Compressed automata for dictionary matching

We address a variant of the dictionary matching problem where the dictionary is represented by a straight line program (SLP). For a given SLP-compressed dictionary D of size n and height h representing m patterns of total length N, we present an O ( n 2 log ? N ) -size representation of Aho-Corasick automaton which recognizes all occurrences of the patterns in D in amortized O ( h + m ) running time per character. We also propose an algorithm to construct this compressed Aho-Corasick automaton in O ( n 3 log ? n log ? N ) time and O ( n 2 log ? N ) space. In a spacial case where D represents only a single pattern, we present an O ( n log ? N ) -size representation of the Morris-Pratt automaton which permits us to find all occurrences of the pattern in amortized O ( h ) running time per character, and we show how to construct this representation in O ( n 3 log ? n log ? N ) time with O ( n 2 log ? N ) working space.

[1]  Johannes Fischer,et al.  LZ-Compressed String Dictionaries , 2014, 2014 Data Compression Conference.

[2]  Craig G. Nevill-Manning,et al.  Compression by induction of hierarchical grammars , 1994, Proceedings of IEEE Data Compression Conference (DCC'94).

[3]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[4]  Wojciech Plandowski,et al.  Efficient Algorithms for Lempel-Zip Encoding (Extended Abstract) , 1996, SWAT.

[5]  Ayumi Shinohara,et al.  Collage system: a unifying framework for compressed pattern matching , 2003, Theor. Comput. Sci..

[6]  Djamal Belazzougui Succinct Dictionary Matching with No Slowdown , 2010, CPM.

[7]  Wojciech Rytter,et al.  An Efficient Pattern-Matching Algorithm for Strings with Short Descriptions , 1997, Nord. J. Comput..

[8]  Ayumi Shinohara,et al.  An Improved Pattern Matching Algorithm for Strings in Terms of Straight-Line Programs , 1997, CPM.

[9]  Alistair Moffat,et al.  Off-line dictionary-based compression , 1999, Proceedings of the IEEE.

[10]  Peter Weiner,et al.  Linear Pattern Matching Algorithms , 1973, SWAT.

[11]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[12]  Wojciech Rytter,et al.  Text Algorithms , 1994 .

[13]  Gad M. Landau,et al.  Random access to grammar-compressed strings , 2010, SODA '11.

[14]  Wojciech Rytter Application of Lempel-Ziv factorization to the approximation of grammar-based compression , 2003, Theor. Comput. Sci..

[15]  Terry A. Welch,et al.  A Technique for High-Performance Data Compression , 1984, Computer.

[16]  Wojciech Plandowski,et al.  Efficient algorithms for Lempel-Ziv encoding , 1996 .

[17]  James A. Storer,et al.  Data compression via textual substitution , 1982, JACM.

[18]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.