Order-Preserving Key Compression for In-Memory Search Trees

We present the High-speed Order-Preserving Encoder (HOPE) for in-memory search trees. HOPE is a fast dictionary-based compressor that encodes arbitrary keys while preserving their order. HOPE's approach is to identify common key patterns at a fine granularity and exploit the entropy to achieve high compression rates with a small dictionary. we first develop a theoretical model to reason about order-preserving dictionary designs. We then select six representative compression schemes using this model and implement them in HOPE. These schemes make different trade-offs between compression rate and encoding speed. We evaluate HOPE on five data structures used in databases: SuRF, ART, HOT, B+tree, and Prefix B+tree. Our experiments show that using HOPE allows the search trees to achieve lower query latency (up to 40% lower) and better memory efficiency (up to 30% smaller) simultaneously for most string key workloads.

[1]  Harumi A. Kuno,et al.  Modern B-tree techniques , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[2]  Aaron J. Elmore,et al.  Mostly Order Preserving Dictionaries , 2019, 2019 IEEE 35th International Conference on Data Engineering (ICDE).

[3]  J. M. Yohe,et al.  Algorithm 428: Hu-Tucker minimum redundancy alphabetic coding method [Z] , 1972 .

[4]  Kenneth A. Ross,et al.  Efficient Index Compression in DB2 LUW , 2009, Proc. VLDB Endow..

[5]  Eva Zangerle,et al.  HOT: A Height Optimized Trie Index for Main-Memory Database Systems , 2018, SIGMOD Conference.

[6]  Ian H. Witten,et al.  Arithmetic coding for data compression , 1987, CACM.

[7]  Viktor Leis,et al.  The ART of practical synchronization , 2016, DaMoN '16.

[8]  Carsten Binnig,et al.  Dictionary-based order-preserving string compression for main memory column stores , 2009, SIGMOD Conference.

[9]  Edward Fredkin,et al.  Trie memory , 1960, Commun. ACM.

[10]  Sam Lightstone,et al.  DB2 with BLU Acceleration: So Much More than Just a Column Store , 2013, Proc. VLDB Endow..

[11]  T. C. Hu,et al.  Optimal Computer Search Trees and Variable-Length Alphabetical Codes , 1971 .

[12]  Kenneth A. Ross,et al.  Making B+- trees cache conscious in main memory , 2000, SIGMOD '00.

[13]  Sudipta Sengupta,et al.  The Bw-Tree: A B-tree for new hardware platforms , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[14]  Viktor Leis,et al.  The adaptive radix tree: ARTful indexing for main-memory databases , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[15]  Alfons Kemper,et al.  HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[16]  Donald R. Morrison,et al.  PATRICIA—Practical Algorithm To Retrieve Information Coded in Alphanumeric , 1968, J. ACM.

[17]  Ranjan Sinha,et al.  HAT-Trie: A Cache-Conscious Trie-Based Data Structure For Strings , 2007, ACSC.

[18]  Johannes Gehrke,et al.  Query optimization in compressed database systems , 2001, SIGMOD '01.

[19]  Ingo Müller,et al.  Adaptive String Dictionary Compression in In-Memory Column-Store Database Systems , 2014, EDBT.

[20]  Daniel J. Abadi,et al.  Integrating compression and execution in column-oriented database systems , 2006, SIGMOD Conference.

[21]  Rudolf Bayer,et al.  Prefix B-trees , 1977, TODS.

[22]  Jignesh M. Patel,et al.  A Padded Encoding Scheme to Accelerate Scans by Leveraging Skew , 2015, SIGMOD Conference.

[23]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[24]  Ismail Oukid,et al.  Fast & Strong: The Case of Compressed String Dictionaries on Modern CPUs , 2019, DaMoN.

[25]  Gennady Antoshenkov,et al.  Dictionary-based order-preserving string compression , 1997, The VLDB Journal.

[26]  Michael Haubenschild,et al.  Optimistic Lock Coupling: A Scalable and Efficient General-Purpose Synchronization Method , 2019, IEEE Data Eng. Bull..

[27]  Garret Swart,et al.  How to wring a table dry: entropy compression of relations and querying of compressed relations , 2006, VLDB.

[28]  J. Yohe Algorithm 428: Hu-Tucker minimum redundancy alphabetic coding method [Z] , 1972, CACM.

[29]  Ziqi Wang,et al.  Building a Bw-Tree Takes More Than Just Buzz Words , 2018, SIGMOD Conference.

[30]  G. Nigel Martin,et al.  * Range encoding: an algorithm for removing redundancy from a digitised message , 1979 .

[31]  Michael J. Carey,et al.  A Study of Index Structures for a Main Memory Database Management System , 1986, HPTS.

[32]  Viktor Leis,et al.  SuRF: Practical Range Query Filtering with Fast Succinct Tries , 2018, SIGMOD Conference.

[33]  Wolfgang Lehner,et al.  Efficient In-Memory Indexing with Generalized Prefix Trees , 2011, BTW.

[34]  André Brinkmann,et al.  Hyperion: Building the Largest In-memory Search Tree , 2019, SIGMOD Conference.

[35]  Eddie Kohler,et al.  Cache craftiness for fast multicore key-value storage , 2012, EuroSys '12.

[36]  D. Huffman A Method for the Construction of Minimum-Redundancy Codes , 1952 .

[37]  Michael Stonebraker,et al.  The End of an Architectural Era (It's Time for a Complete Rewrite) , 2007, VLDB.

[38]  Hugh E. Williams,et al.  Burst tries: a fast, efficient data structure for string keys , 2002, TOIS.

[39]  G. Antoshenkov,et al.  Order preserving string compression , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[40]  Lin Ma,et al.  Reducing the Storage Overhead of Main-Memory OLTP Databases with Hybrid Indexes , 2016, SIGMOD Conference.

[41]  Wolfgang Lehner,et al.  KISS-Tree: smart latch-free in-memory indexing on modern architectures , 2012, DaMoN '12.