Engineering Small Space Dictionary Matching

The dictionary matching problem is to locate occurrences of any pattern among a set of patterns in a given text. Massive data sets abound and at the same time, there are many settings in which working space is extremely limited. We introduce dictionary matching software for the space-constrained environment whose running time is close to linear. We use the compressed suffix tree as the underlying data structure of our algorithm, thus, the working space of our algorithm is proportional to the optimal compression of the dictionary. We also contribute a succinct tool for performing constant-time lowest marked ancestor queries on a tree that is succinctly encoded as a sequence of balanced parentheses, with linear time preprocessing of the tree. This tool should be useful in many other applications. Our source code is available at this http URL

[1]  Wing-Kai Hon,et al.  Faster Compressed Dictionary Matching , 2010, SPIRE.

[2]  Johannes Fischer,et al.  Wee LCP , 2009, Inf. Process. Lett..

[3]  Amihood Amir,et al.  Adaptive dictionary matching , 1991, [1991] Proceedings 32nd Annual Symposium of Foundations of Computer Science.

[4]  Costas S. Iliopoulos,et al.  Symposium on String Processing and Information Retrieval (SPIRE01) , 2001 .

[5]  Gonzalo Navarro,et al.  DACs: Bringing direct access to variable-length codes , 2013, Inf. Process. Manag..

[6]  Wing-Kai Hon,et al.  Compressed Index for Dictionary Matching , 2008, Data Compression Conference (dcc 2008).

[7]  Kunihiko Sadakane,et al.  New text indexing functionalities of the compressed suffix arrays , 2003, J. Algorithms.

[8]  Kunihiko Sadakane,et al.  Compressed Suffix Trees with Full Functionality , 2007, Theory of Computing Systems.

[9]  김동규,et al.  [서평]「Algorithms on Strings, Trees, and Sequences」 , 2000 .

[10]  Gonzalo Navarro,et al.  Fully compressed suffix trees , 2008, TALG.

[11]  Simon Gog,et al.  Compressed suffix trees: design, construction, and applications , 2011 .

[12]  Sebastiano Vigna,et al.  Broadword Implementation of Rank/Select Queries , 2008, WEA.

[13]  Guy Jacobson,et al.  Space-efficient static trees and graphs , 1989, 30th Annual Symposium on Foundations of Computer Science.

[14]  Gonzalo Navarro,et al.  Fully-functional succinct trees , 2010, SODA '10.

[15]  Leonid Boytsov,et al.  Indexing methods for approximate dictionary searching: Comparative analysis , 2011, JEAL.

[16]  Paolo Ferragina,et al.  Indexing compressed text , 2005, JACM.

[17]  Roberto Grossi,et al.  High-order entropy-compressed text indexes , 2003, SODA '03.

[18]  Kunihiko Sadakane,et al.  Succinct representations of lcp information and improvements in the compressed suffix arrays , 2002, SODA '02.

[19]  Wing-Kai Hon,et al.  Compressed indexes for dynamic text collections , 2007, TALG.

[20]  Gonzalo Navarro,et al.  Compressed representations of sequences and full-text indexes , 2007, TALG.

[21]  David R. Clark,et al.  Efficient suffix trees on secondary storage , 1996, SODA '96.

[22]  Djamal Belazzougui Succinct Dictionary Matching with No Slowdown , 2010, CPM.