Dynamic Path-decomposed Tries

A keyword dictionary is an associative array whose keys are strings. Recent applications handling massive keyword dictionaries in main memory have a need for a space-efficient implementation. When limited to static applications, there are a number of highly compressed keyword dictionaries based on the advancements of practical succinct data structures. However, as most succinct data structures are only efficient in the static case, it is still difficult to implement a keyword dictionary that is space efficient and dynamic. In this article, we propose such a keyword dictionary. Our main idea is to embrace the path decomposition technique, which was proposed for constructing cache-friendly tries. To store the path-decomposed trie in small memory, we design data structures based on recent compact hash trie representations. Experiments on real-world datasets reveal that our dynamic keyword dictionary needs up to 68% less space than the existing smallest ones, while achieving a relevant space-time tradeoff.

[1]  Rajeev Raman,et al.  Compact Dynamic Rewritable (CDRW) Arrays , 2017, ALENEX.

[2]  Philippe Cudré-Mauroux,et al.  dipLODocus[RDF] - Short and Long-Tail RDF Analytics for Massive Webs of Data , 2011, SEMWEB.

[3]  Sriram Raghavan,et al.  WebBase: a repository of Web pages , 2000, Comput. Networks.

[4]  Edward Fredkin,et al.  Trie memory , 1960, Commun. ACM.

[5]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[6]  H. S. Heaps,et al.  Information retrieval, computational and theoretical aspects , 1978 .

[7]  Sebastiano Vigna,et al.  Dynamic Z-Fast Tries , 2010, SPIRE.

[8]  Clerry,et al.  Compact Hash Tables Using Bidirectional Linear Probing , 1984, IEEE Trans. Computers.

[9]  Kunihiko Sadakane,et al.  Linked Dynamic Tries with Applications to LZ-Compression in Sublinear Time and Space , 2013, Algorithmica.

[10]  Rajeev Raman,et al.  m-Bonsai: a Practical Compact Dynamic Trie , 2018, Int. J. Found. Comput. Sci..

[11]  Giuseppe Ottaviano,et al.  Fast Compressed Tries through Path Decompositions , 2011, ALENEX.

[12]  Takashi Sato,et al.  An efficient digital search algorithm by using a double-array structure , 1988, Proceedings COMPSAC 88: The Twelfth Annual International Computer Software & Applications Conference.

[13]  Giuseppe Ottaviano,et al.  Space-efficient data structures for Top-k completion , 2013, WWW '13.

[14]  Jimmy J. Lin,et al.  Earlybird: Real-Time Search at Twitter , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[15]  Masaru Kitsuregawa,et al.  A Self-adaptive Classifier for Efficient Text-stream Processing , 2014, COLING.

[16]  Masao Fuketa,et al.  Compressed double-array tries for string dictionaries supporting fast lookup , 2017, Knowledge and Information Systems.

[17]  Alistair Moffat,et al.  From Theory to Practice: Plug and Play with Succinct Data Structures , 2013, SEA.

[18]  Paul T. Groth,et al.  TripleProv: efficient processing of lineage queries in a native RDF store , 2014, WWW.

[19]  S. Srinivasa Rao,et al.  Succinct Dynamic Cardinal Trees , 2016, Algorithmica.

[20]  Masao Fuketa,et al.  Practical rearrangement methods for dynamic double‐array dictionaries , 2018, Softw. Pract. Exp..

[21]  Hideo Bannai,et al.  LZD Factorization: Simple and Practical Online Grammar Compression with Variable-to-Fixed Encoding , 2015, CPM.

[22]  Shunsuke Inenaga,et al.  c-Trie++: A Dynamic Trie Tailored for Fast Prefix Searches , 2020, 2020 Data Compression Conference (DCC).

[23]  Robert Sedgewick,et al.  Fast algorithms for sorting and searching strings , 1997, SODA '97.

[24]  Guy L. Steele,et al.  Fast splittable pseudorandom number generators , 2014, OOPSLA 2014.

[25]  Giuseppe Ottaviano,et al.  Design of Practical Succinct Data Structures for Large Data Collections , 2013, SEA.

[26]  Donald E. Knuth,et al.  The art of computer programming, volume 3: (2nd ed.) sorting and searching , 1998 .

[27]  Viktor Leis,et al.  The adaptive radix tree: ARTful indexing for main-memory databases , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[28]  Johannes Fischer,et al.  Practical Evaluation of Lempel-Ziv-78 and Lempel-Ziv-Welch Tries , 2017, SPIRE.

[29]  Sebastiano Vigna,et al.  UbiCrawler: a scalable fully distributed Web crawler , 2004, Softw. Pract. Exp..

[30]  Hugh E. Williams,et al.  Burst tries: a fast, efficient data structure for string keys , 2002, TOIS.

[31]  Ian H. Witten,et al.  Bonsai: A compact representation of trees , 1993, Softw. Pract. Exp..

[32]  Shunsuke Kanda,et al.  Space- and Time-Efficient String Dictionaries , 2018 .

[33]  Alexandre Daigle Optimal Path-Decomposition of Tries , 2016 .

[34]  Justin Zobel,et al.  Cache-Conscious Collision Resolution in String Hash Tables , 2005, SPIRE.

[35]  Hugh E. Williams,et al.  Compressing Integers for Fast File Access , 1999, Comput. J..

[36]  Simon J. Puglisi,et al.  Fast and Simple Compact Hashing via Bucketing , 2020, SEA.

[37]  Masao Fuketa,et al.  Practical Implementation of Space-Efficient Dynamic Keyword Dictionaries , 2017, SPIRE.

[38]  Philippe Cudré-Mauroux,et al.  A Comparison of Data Structures to Manage URIs on the Web of Data , 2015, ESWC.

[39]  Shunsuke Inenaga,et al.  On Two LZ78-style Grammars: Compression Bounds and Compressed-Space Computation , 2017, SPIRE.

[40]  Dominik Koppl,et al.  Separate Chaining Meets Compact Hashing , 2019, ArXiv.

[41]  Nieves R. Brisaboa,et al.  Practical compressed string dictionaries , 2016, Inf. Syst..

[42]  Rajeev Raman,et al.  LZ78 Compression in Low Main Memory Space , 2017, SPIRE.

[43]  Ranjan Sinha,et al.  Engineering scalable, cache and space efficient tries for strings , 2010, The VLDB Journal.

[44]  R. González,et al.  PRACTICAL IMPLEMENTATION OF RANK AND SELECT QUERIES , 2005 .

[45]  Hiroki Arimura,et al.  Packed Compact Tries: A Fast and Efficient Data Structure for Online String Processing , 2016, IWOCA.

[46]  Terry A. Welch,et al.  A Technique for High-Performance Data Compression , 1984, Computer.

[47]  Jeff Heflin,et al.  LUBM: A benchmark for OWL knowledge base systems , 2005, J. Web Semant..

[48]  Johannes Fischer,et al.  Lempel–Ziv-78 Compressed String Dictionaries , 2018, Algorithmica.

[49]  Nicola Prezza,et al.  A Framework of Dynamic Data Structures for String Processing , 2017, SEA.

[50]  Donald E. Knuth,et al.  The Art of Computer Programming, Vol. 3: Sorting and Searching , 1974 .

[51]  J. Pino Modern Information Retrieval. Ricardo Baeza-Yates y Berthier Ribeiro-Neto Addison Wesley HarIow, England, 1999 , 1999 .

[52]  Roberto Grossi,et al.  On searching compressed string collections cache-obliviously , 2008, PODS.

[53]  Kazuya Tsuruta,et al.  Dynamic Packed Compact Tries Revisited , 2019, ArXiv.