Succinct 2D Dictionary Matching

The dictionary matching problem seeks all locations in a given text that match any of the patterns in a given dictionary. Efficient algorithms for dictionary matching scan the text once, searching for all patterns simultaneously. Existing algorithms that solve the 2-dimensional dictionary matching problem all require working space proportional to the size of the dictionary.This paper presents the first efficient 2-dimensional dictionary matching algorithm that operates in small space. Given d patterns, D={P1,…,Pd}, each of size m×m, and a text T of size n×n, our algorithm finds all occurrences of Pi, 1≤i≤d, in T. The preprocessing of the dictionary forms a compressed self-index of the patterns, after which the original dictionary may be discarded. Our algorithm uses O(dmlogdm) extra bits of space. The time complexity of our algorithm is close to linear, O(dm2+n2τlogσ), where τ is the time it takes to access a character in the compressed self-index and σ is the size of the alphabet. Using recent results τ is at most sub-logarithmic.

[1]  Wing-Kai Hon,et al.  Compressed Index for Dictionary Matching , 2008, Data Compression Conference (dcc 2008).

[2]  Gonzalo Navarro,et al.  Fully compressed suffix trees , 2008, TALG.

[3]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[4]  Paolo Ferragina,et al.  A simple storage scheme for strings achieving entropy bounds , 2007, SODA '07.

[5]  Michael G. Main,et al.  An O(n log n) Algorithm for Finding All Repetitions in a String , 1984, J. Algorithms.

[6]  Gad M. Landau,et al.  Inplace 2D matching in compressed images , 2003, SODA '03.

[7]  Theodore P. Baker A Technique for Extending Rapid Exact-Match String Matching to Arrays of More Than One Dimension , 1978, SIAM J. Comput..

[8]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[9]  Wing-Kai Hon,et al.  Compressed indexes for dynamic text collections , 2007, TALG.

[10]  R.S. Bird,et al.  Two Dimensional Pattern Matching , 1977, Inf. Process. Lett..

[11]  Dina Sokol,et al.  Small-Space 2D Compressed Dictionary Matching , 2010, CPM.

[12]  Giovanni Manzini,et al.  An analysis of the Burrows-Wheeler transform , 2001, SODA '99.

[13]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[14]  Roberto Grossi,et al.  High-order entropy-compressed text indexes , 2003, SODA '03.

[15]  Djamal Belazzougui Succinct Dictionary Matching with No Slowdown , 2010, CPM.

[16]  Kimmo Fredriksson,et al.  Simple Random Access Compression , 2009, Fundam. Informaticae.

[17]  Gonzalo Navarro,et al.  Faster entropy-bounded compressed suffix trees , 2009, Theor. Comput. Sci..

[18]  Johannes Fischer,et al.  Wee LCP , 2009, Inf. Process. Lett..

[19]  Gary Benson,et al.  An Alphabet Independent Approach to Two-Dimensional Pattern Matching , 1994, SIAM J. Comput..

[20]  Wojciech Rytter,et al.  A Constant Time Optimal Parallel Algorithm for Two-Dimensional Pattern Matching , 1998, SIAM J. Comput..

[21]  Michael A. Bender,et al.  The LCA Problem Revisited , 2000, LATIN.

[22]  Kimmo Fredriksson,et al.  Succinct backward-DAWG-matching , 2009, JEAL.

[23]  Alejandro A. Schäffer,et al.  Multiple matching of rectangular patterns , 1993, STOC '93.

[24]  Amihood Amir,et al.  Two-Dimensional Dictionary Matching , 1992, Inf. Process. Lett..

[25]  Wojciech Plandowski,et al.  Two-Dimensional Pattern Matching in Linear Time and Small Space , 1995, STACS.

[26]  M. Crochemore,et al.  On-line construction of suffix trees , 2002 .

[27]  Robert E. Tarjan,et al.  Fast Algorithms for Finding Nearest Common Ancestors , 1984, SIAM J. Comput..

[28]  Wing-Kai Hon,et al.  Succinct Index for Dynamic Dictionary Matching , 2009, ISAAC.

[29]  M. Lothaire Applied Combinatorics on Words (Encyclopedia of Mathematics and its Applications) , 2005 .