Packed Compact Tries: A Fast and Efficient Data Structure for Online String Processing

We present a new data structure called the packed compact trie (packed c-trie) which stores a set S of k strings of total length n in \(n \log \sigma + O(k \log n)\) bits of space and supports fast pattern matching queries and updates, where \(\sigma \) is the alphabet size. Assume that \(\alpha = \log _\sigma n\) letters are packed in a single machine word on the standard word RAM model, and let f(k, n) denote the query and update times of the dynamic predecessor/successor data structure of our choice which stores k integers from universe [1, n] in \(O(k \log n)\) bits of space. Then, given a string of length m, our packed c-tries support pattern matching queries and insert/delete operations in \(O(\frac{m}{\alpha } f(k,n))\) worst-case time and in \(O(\frac{m}{\alpha } + f(k,n))\) expected time. Our experiments show that our packed c-tries are faster than the standard compact tries (a.k.a. Patricia trees) on real data sets. We also discuss applications of our packed c-tries.

[1]  Stephen Alstrup,et al.  Nearest Common Ancestors: A Survey and a New Algorithm for a Distributed Environment , 2004, Theory of Computing Systems.

[2]  Faith Ellen,et al.  Optimal Bounds for the Predecessor Problem and Related Problems , 2002, J. Comput. Syst. Sci..

[3]  Dan E. Willard New Trie Data Structures Which Support Very Fast Search Operations , 1984, J. Comput. Syst. Sci..

[4]  Hiroki Arimura,et al.  Sparse and Truncated Suffix Trees on Variable-Length Codes , 2011, CPM.

[5]  Donald R. Morrison,et al.  PATRICIA—Practical Algorithm To Retrieve Information Coded in Alphanumeric , 1968, J. ACM.

[6]  Kunihiko Sadakane,et al.  Linked Dynamic Tries with Applications to LZ-Compression in Sublinear Time and Space , 2013, Algorithmica.

[7]  Richard Cole,et al.  Dictionary matching and indexing with errors and don't cares , 2004, STOC '04.

[8]  Stephen Alstrup,et al.  Nearest common ancestors: a survey and a new distributed algorithm , 2002, SPAA.

[9]  Richard Cole,et al.  Dynamic LCA queries on trees , 1999, SODA '99.

[10]  Shunsuke Inenaga,et al.  On-Line Linear-Time Construction of Word Suffix Trees , 2006, CPM.

[11]  Roberto Grossi,et al.  The string B-tree: a new data structure for string search in external memory and its applications , 1999, JACM.

[12]  Dan E. Willard Log-Logarithmic Worst-Case Range Queries are Possible in Space Theta(N) , 1983, Inf. Process. Lett..

[13]  Peter Weiner,et al.  Linear Pattern Matching Algorithms , 1973, SWAT.

[14]  Esko Ukkonen,et al.  On-line construction of suffix trees , 1995, Algorithmica.

[15]  Wing-Kai Hon,et al.  Succinct Index for Dynamic Dictionary Matching , 2009, ISAAC.

[16]  Juha Kärkkäinen,et al.  Sparse Suffix Trees , 1996, COCOON.

[17]  Sebastiano Vigna,et al.  Dynamic Z-Fast Tries , 2010, SPIRE.

[18]  Hideo Bannai,et al.  LZD Factorization: Simple and Practical Online Grammar Compression with Variable-to-Fixed Encoding , 2015, CPM.

[19]  Mikkel Thorup,et al.  Dynamic ordered sets with exponential search trees , 2002, J. ACM.

[20]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[21]  Johannes Fischer,et al.  Alphabet-Dependent String Searching with Wexponential Search Trees , 2015, CPM.

[22]  Michael L. Fredman,et al.  Surpassing the Information Theoretic Bound with Fusion Trees , 1993, J. Comput. Syst. Sci..

[23]  Philip Bille,et al.  Optimal Packed String Matching , 2011, FSTTCS.