Engineering scalable, cache and space efficient tries for strings

Storing and retrieving strings in main memory is a fundamental problem in computer science. The efficiency of string data structures used for this task is of paramount importance for applications such as in-memory databases, text-based search engines and dictionaries. The burst trie is a leading choice for such tasks, as it can provide fast sorted access to strings. The burst trie, however, uses linked lists as substructures which can result in poor use of CPU cache and main memory. Previous research addressed this issue by replacing linked lists with dynamic arrays forming a cache-conscious array burst trie. Though faster, this variant can incur high instruction costs which can hinder its efficiency. Thus, engineering a fast, compact, and scalable trie for strings remains an open problem. In this paper, we introduce a novel and practical solution that carefully combines a trie with a hash table, creating a variant of burst trie called HAT-trie. We provide a thorough experimental analysis which demonstrates that for large set of strings and on alternative computing architectures, the HAT-trie—and two novel variants engineered to achieve further space-efficiency—is currently the leading in-memory trie-based data structure offering rapid, compact, and scalable storage and retrieval of variable-length strings.

[1]  Michael A. Bender,et al.  The Cost of Cache-Oblivious Searching , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[2]  Ken Kennedy,et al.  Software prefetching , 1991, ASPLOS IV.

[3]  Ghada Hany Badr,et al.  On using conditional rotations and randomized heuristics for self-organizing ternary search tries , 2005, ACM-SE 43.

[4]  장훈,et al.  [서평]「Computer Organization and Design, The Hardware/Software Interface」 , 1997 .

[5]  Jean-Loup Baer,et al.  An effective on-chip preloading scheme to reduce data access penalty , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[6]  Richard E. Ladner,et al.  A Comparison of Cache Aware and Cache Oblivious Static Search Trees Using Program Instrumentation , 2000, Experimental Algorithmics.

[7]  Kathryn S. McKinley,et al.  Reconsidering custom memory allocation , 2002, OOPSLA '02.

[8]  Goetz Graefe,et al.  Hash Joins and Hash Teams in Microsoft SQL Server , 1998, VLDB.

[9]  Dilip V. Sarwate,et al.  A Note on Universal Classes of Hash Functions , 1980, Inf. Process. Lett..

[10]  Justin Zobel,et al.  Cache-efficient string sorting using copying , 2007, ACM J. Exp. Algorithmics.

[11]  Gaston H. Gonnet,et al.  Handbook Of Algorithms And Data Structures , 1984 .

[12]  Kathryn S. McKinley,et al.  Guided region prefetching: a cooperative hardware/software approach , 2003, ISCA '03.

[13]  James R. Larus,et al.  Cache-conscious structure layout , 1999, PLDI '99.

[14]  Kenneth A. Ross,et al.  Cache Conscious Indexing for Decision-Support in Main Memory , 1999, VLDB.

[15]  Douglas Comer,et al.  Heuristics for trie index minimization , 1979, ACM Trans. Database Syst..

[16]  Philippe Flajolet,et al.  Partial match retrieval of multidimensional data , 1986, JACM.

[17]  Dennis G. Severance,et al.  Identifier Search Mechanisms: A Survey and Generalized Model , 1974, CSUR.

[18]  Ranjan Sinha,et al.  HAT-Trie: A Cache-Conscious Trie-Based Data Structure For Strings , 2007, ACSC.

[19]  Aneesh Aggarwal,et al.  Software caching vs. prefetching , 2002, ISMM '02.

[20]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[21]  Hugh E. Williams,et al.  Burst tries: a fast, efficient data structure for string keys , 2002, TOIS.

[22]  Gerth Stølting Brodal,et al.  Cache oblivious search trees via binary trees of small height , 2001, SODA '02.

[23]  Per Stenström,et al.  A prefetching technique for irregular accesses to linked data structures , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).

[24]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[25]  Wojciech Szpankowski,et al.  A Note on the Asymptotic Behavior of the Heights in b-Tries for b Large , 2000, Electron. J. Comb..

[26]  Dirk Grunwald,et al.  Prefetching Using Markov Predictors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[27]  J. Peter Kincaid,et al.  Variable-depth trie index optimization: theory and experimental results , 1989, TODS.

[28]  Justin Zobel,et al.  B-tries for disk-based string management , 2008, The VLDB Journal.

[29]  Michael Rodeh,et al.  Virtual Cache Line: A New Technique to Improve Cache Exploitation for Recursive Data Structures , 1999, CC.

[30]  Chandra Krintz,et al.  Cache-conscious data placement , 1998, ASPLOS VIII.

[31]  Janak H. Patel,et al.  Stride directed prefetching in scalar processors , 1992, MICRO.

[32]  Todd C. Mowry,et al.  Tolerating latency through software-controlled data prefetching , 1994 .

[33]  Nikolas Askitis,et al.  Fast and Compact Hash Tables for Integer Keys , 2009, ACSC.

[34]  Lothar Thiele,et al.  Abstracts Collection , 2004, Design of Systems with Predictable Behaviour.

[35]  Michael A. Bender,et al.  Cache-oblivious B-trees , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[36]  Tom Shanley The Unabridged Pentium 4: IA32 Processor Genealogy , 2004 .

[37]  David A. Patterson,et al.  Computer Organization and Design, Fourth Edition, Fourth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) , 2008 .

[38]  Susan J. Eggers,et al.  Balanced scheduling: instruction scheduling when memory latency is uncertain , 1993, PLDI '93.

[39]  Jack Dongarra,et al.  Using PAPI for Hardware Performance Monitoring on Linux Systems , 2001 .

[40]  Michael A. Bender,et al.  Cache-oblivious string B-trees , 2006, PODS '06.

[41]  Steven S. Muchnick,et al.  Advanced Compiler Design and Implementation , 1997 .

[42]  Roberto Grossi,et al.  Search Data Structures for Skewed Strings , 2003, WEA.

[43]  Donna K. Harman,et al.  Overview of the Second Text REtrieval Conference (TREC-2) , 1994, HLT.

[44]  Edward M. McCreight,et al.  A Space-Economical Suffix Tree Construction Algorithm , 1976, JACM.

[45]  James R. Larus,et al.  Cache-conscious data structures: design and implementation , 1999 .

[46]  Chi-Keung Luk,et al.  Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors , 2001, Proceedings 28th Annual International Symposium on Computer Architecture.

[47]  Ken Kennedy,et al.  Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .

[48]  Hugh E. Williams,et al.  In-memory hash tables for accumulating text vocabularies , 2001, Inf. Process. Lett..

[49]  Hugh E. Williams,et al.  Self‐adjusting trees in practice for large text collections , 2001, Softw. Pract. Exp..

[50]  Vikram S. Adve,et al.  Automatic pool allocation: improving performance by controlling data structure layout in the heap , 2005, PLDI '05.

[51]  Mikko H. Lipasti,et al.  Cache miss heuristics and preloading techniques for general-purpose programs , 1995, MICRO 28.

[52]  Kurt Mehlhorn,et al.  04301 Abstracts Collection - Cache-Oblivious and Cache-Aware Algorithms , 2004, Cache-Oblivious and Cache-Aware Algorithms.

[53]  Lars Arge,et al.  Cache-Oblivious Data Structures , 2004 .

[54]  Michael A. Bender,et al.  Cache-oblivious priority queue and graph algorithm applications , 2002, STOC '02.

[55]  Srinivasan Parthasarathy,et al.  Cache-conscious frequent pattern mining on modern and emerging processors , 2007, The VLDB Journal.

[56]  Andreas Moshovos,et al.  Dependence based prefetching for linked data structures , 1998, ASPLOS VIII.

[57]  Philippe Flajolet,et al.  The analysis of hybrid trie structures , 1998, SODA '98.

[58]  Robert Sedgewick,et al.  Fast algorithms for sorting and searching strings , 1997, SODA '97.

[59]  G. H. Gonnet,et al.  Handbook of algorithms and data structures: in Pascal and C (2nd ed.) , 1991 .

[60]  Piyush Kumar Cache Oblivious Algorithms , 2002, Algorithms for Memory Hierarchies.

[61]  Masami Shishibori,et al.  A Trie Compaction Algorithm for a Large Set of Keys , 1996, IEEE Trans. Knowl. Data Eng..

[62]  Douglas Comer,et al.  Ubiquitous B-Tree , 1979, CSUR.

[63]  Donald E. Knuth,et al.  The art of computer programming: sorting and searching (volume 3) , 1973 .

[64]  Ian H. Witten,et al.  Text Compression , 1990, 125 Problems in Text Algorithms.

[65]  Trishul M. Chilimbi,et al.  Cache-conscious coallocation of hot data streams , 2006, PLDI '06.

[66]  Chia-Lin Yang,et al.  Tolerating memory latency through push prefetching for pointer-intensive applications , 2004, TACO.

[67]  Michael A. Bender,et al.  Efficient Tree Layout in a Multilevel Memory Hierarchy , 2002, ESA.

[68]  Donald E. Knuth,et al.  The Art of Computer Programming: Volume 3: Sorting and Searching , 1998 .

[69]  Stefan G. Berg Cache Prefetching , 2002 .

[70]  Mikko H. Lipasti,et al.  Partial resolution in branch target buffers , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.

[71]  Harry A. G. Wijshoff,et al.  Managing pages in shared virtual memory systems: getting the compiler into the game , 1993, ICS '93.

[72]  Guang R. Gao,et al.  Speculative Prefetching of Induction Pointers , 2001, CC.

[73]  Gerth Stølting Brodal,et al.  Cache-oblivious string dictionaries , 2006, SODA '06.

[74]  Jing Wu,et al.  A locality-preserving cache-oblivious dynamic dictionary , 2002, SODA '02.

[75]  Donald Yeung,et al.  The Efficacy of Software Prefetching and Locality Optimizations on Future Memory Systems , 2004, J. Instr. Level Parallelism.

[76]  Justin Zobel,et al.  Performance in Practice of String Hashing Functions , 1997, DASFAA.

[77]  Ellis Horowitz,et al.  Algorithms for trie compaction , 1984, TODS.

[78]  Mats Brorsson,et al.  Cache-Conscious Allocation of Pointer-Based Data Structures Revisited with HW/SW Prefetching , 2003 .

[79]  Takashi Sato,et al.  An efficient implementation of trie structures , 1992, Softw. Pract. Exp..

[80]  Scott Lystig Fritchie,et al.  A study of Erlang ETS table implementations and performance , 2003, ERLANG '03.

[81]  G. Sohi,et al.  Effective jump-pointer prefetching for linked data structures , 1999, Proceedings of the 26th International Symposium on Computer Architecture (Cat. No.99CB36367).

[82]  Michael A. Bender,et al.  Cache-oblivious streaming B-trees , 2007, SPAA '07.

[83]  Philippe Flajolet,et al.  Dynamical Sources in Information Theory : A General Analysis of Trie Structures , 1999 .

[84]  Donald E. Knuth,et al.  The art of computer programming, volume 3: (2nd ed.) sorting and searching , 1998 .

[85]  Justin Zobel,et al.  Cache-conscious sorting of large sets of strings with dynamic tries , 2004, JEAL.

[86]  Weng-Fai Wong,et al.  Dynamic memory optimization using pool allocation and prefetching , 2005, CARN.

[87]  James R. Goodman,et al.  Memory Bandwidth Limitations of Future Microprocessors , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[88]  Donald R. Morrison,et al.  PATRICIA—Practical Algorithm To Retrieve Information Coded in Alphanumeric , 1968, J. ACM.

[89]  Kurt Maly Compressed tries , 1976, CACM.

[90]  François Bodin,et al.  Improving cache behavior of dynamically allocated data structures , 1998, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192).

[91]  Chau-Wen Tseng,et al.  Compiler optimizations for improving data locality , 1994, ASPLOS VI.

[92]  David Loshin Efficient Memory Programming , 1998 .

[93]  Keshav Pingali,et al.  An experimental comparison of cache-oblivious and cache-conscious programs , 2007, SPAA '07.

[94]  Justin Zobel,et al.  Cache-Conscious Collision Resolution in String Hash Tables , 2005, SPIRE.

[95]  Wojciech Szpankowski On the height of digital trees and related problems , 2005, Algorithmica.

[96]  Kai Shen,et al.  Adaptive Algorithms for Cache-Efficient Trie Search , 1998, ALENEX.

[97]  Stefan Nilsson,et al.  Implementing a Dynamic Compressed Trie , 1998, WAE.

[98]  David F. Bacon,et al.  Compiler transformations for high-performance computing , 1994, CSUR.

[99]  Trevor Mudge,et al.  Improving data cache performance by pre-executing instructions under a cache miss , 1997 .

[100]  Kenneth A. Ross,et al.  Making B+- trees cache conscious in main memory , 2000, SIGMOD '00.

[101]  Todd C. Mowry,et al.  Compiler-based prefetching for recursive data structures , 1996, ASPLOS VII.

[102]  Christopher J. Hughes,et al.  Memory-side prefetching for linked data structures for processor-in-memory systems , 2005, J. Parallel Distributed Comput..

[103]  Edward Fredkin,et al.  Trie memory , 1960, Commun. ACM.

[104]  Jean-Loup Baer,et al.  Effective Hardware Based Data Prefetching for High-Performance Processors , 1995, IEEE Trans. Computers.

[105]  Matteo Frigo,et al.  Cache-oblivious algorithms , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[106]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[107]  Robert Sedgewick,et al.  Algorithms in C - parts 1-4: fundamentals, data structures, sorting, searching (3. ed.) , 1997 .

[108]  Brad Calder,et al.  Pointer cache assisted prefetching , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[109]  Allen,et al.  Optimizing Compilers for Modern Architectures , 2004 .

[110]  Aneesh Aggarwal Software caching vs. prefetching , 2002, MSP/ISMM.