论文信息 - Order-Preserving Key Compression for In-Memory Search Trees

Order-Preserving Key Compression for In-Memory Search Trees

We present the High-speed Order-Preserving Encoder (HOPE) for in-memory search trees. HOPE is a fast dictionary-based compressor that encodes arbitrary keys while preserving their order. HOPE's approach is to identify common key patterns at a fine granularity and exploit the entropy to achieve high compression rates with a small dictionary. we first develop a theoretical model to reason about order-preserving dictionary designs. We then select six representative compression schemes using this model and implement them in HOPE. These schemes make different trade-offs between compression rate and encoding speed. We evaluate HOPE on five data structures used in databases: SuRF, ART, HOT, B+tree, and Prefix B+tree. Our experiments show that using HOPE allows the search trees to achieve lower query latency (up to 40% lower) and better memory efficiency (up to 30% smaller) simultaneously for most string key workloads.

[1] Harumi A. Kuno,et al. Modern B-tree techniques , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[2] Aaron J. Elmore,et al. Mostly Order Preserving Dictionaries , 2019, 2019 IEEE 35th International Conference on Data Engineering (ICDE).

[3] J. M. Yohe,et al. Algorithm 428: Hu-Tucker minimum redundancy alphabetic coding method [Z] , 1972 .

[4] Kenneth A. Ross,et al. Efficient Index Compression in DB2 LUW , 2009, Proc. VLDB Endow..

[5] Eva Zangerle,et al. HOT: A Height Optimized Trie Index for Main-Memory Database Systems , 2018, SIGMOD Conference.

[6] Ian H. Witten,et al. Arithmetic coding for data compression , 1987, CACM.

[7] Viktor Leis,et al. The ART of practical synchronization , 2016, DaMoN '16.

[8] Carsten Binnig,et al. Dictionary-based order-preserving string compression for main memory column stores , 2009, SIGMOD Conference.

[9] Edward Fredkin,et al. Trie memory , 1960, Commun. ACM.

[10] Sam Lightstone,et al. DB2 with BLU Acceleration: So Much More than Just a Column Store , 2013, Proc. VLDB Endow..

[11] T. C. Hu,et al. Optimal Computer Search Trees and Variable-Length Alphabetical Codes , 1971 .

[12] Kenneth A. Ross,et al. Making B+- trees cache conscious in main memory , 2000, SIGMOD '00.

[13] Sudipta Sengupta,et al. The Bw-Tree: A B-tree for new hardware platforms , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[14] Viktor Leis,et al. The adaptive radix tree: ARTful indexing for main-memory databases , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[15] Alfons Kemper,et al. HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[16] Donald R. Morrison,et al. PATRICIA—Practical Algorithm To Retrieve Information Coded in Alphanumeric , 1968, J. ACM.

[17] Ranjan Sinha,et al. HAT-Trie: A Cache-Conscious Trie-Based Data Structure For Strings , 2007, ACSC.

[18] Johannes Gehrke,et al. Query optimization in compressed database systems , 2001, SIGMOD '01.

[19] Ingo Müller,et al. Adaptive String Dictionary Compression in In-Memory Column-Store Database Systems , 2014, EDBT.

[20] Daniel J. Abadi,et al. Integrating compression and execution in column-oriented database systems , 2006, SIGMOD Conference.

[21] Rudolf Bayer,et al. Prefix B-trees , 1977, TODS.

[22] Jignesh M. Patel,et al. A Padded Encoding Scheme to Accelerate Scans by Leveraging Skew , 2015, SIGMOD Conference.

[23] Adam Silberstein,et al. Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[24] Ismail Oukid,et al. Fast & Strong: The Case of Compressed String Dictionaries on Modern CPUs , 2019, DaMoN.

[25] Gennady Antoshenkov,et al. Dictionary-based order-preserving string compression , 1997, The VLDB Journal.

[26] Michael Haubenschild,et al. Optimistic Lock Coupling: A Scalable and Efficient General-Purpose Synchronization Method , 2019, IEEE Data Eng. Bull..

[27] Garret Swart,et al. How to wring a table dry: entropy compression of relations and querying of compressed relations , 2006, VLDB.

[28] J. Yohe. Algorithm 428: Hu-Tucker minimum redundancy alphabetic coding method [Z] , 1972, CACM.

[29] Ziqi Wang,et al. Building a Bw-Tree Takes More Than Just Buzz Words , 2018, SIGMOD Conference.

[30] G. Nigel Martin,et al. * Range encoding: an algorithm for removing redundancy from a digitised message , 1979 .

[31] Michael J. Carey,et al. A Study of Index Structures for a Main Memory Database Management System , 1986, HPTS.

[32] Viktor Leis,et al. SuRF: Practical Range Query Filtering with Fast Succinct Tries , 2018, SIGMOD Conference.

[33] Wolfgang Lehner,et al. Efficient In-Memory Indexing with Generalized Prefix Trees , 2011, BTW.

[34] André Brinkmann,et al. Hyperion: Building the Largest In-memory Search Tree , 2019, SIGMOD Conference.

[35] Eddie Kohler,et al. Cache craftiness for fast multicore key-value storage , 2012, EuroSys '12.

[36] D. Huffman. A Method for the Construction of Minimum-Redundancy Codes , 1952 .

[37] Michael Stonebraker,et al. The End of an Architectural Era (It's Time for a Complete Rewrite) , 2007, VLDB.

[38] Hugh E. Williams,et al. Burst tries: a fast, efficient data structure for string keys , 2002, TOIS.

[39] G. Antoshenkov,et al. Order preserving string compression , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[40] Lin Ma,et al. Reducing the Storage Overhead of Main-Memory OLTP Databases with Hybrid Indexes , 2016, SIGMOD Conference.

[41] Wolfgang Lehner,et al. KISS-Tree: smart latch-free in-memory indexing on modern architectures , 2012, DaMoN '12.