Compressed Key Sort and Fast Index Reconstruction

In this paper we propose an index key compression scheme based on the notion of distinction bits by proving that the distinction bits of index keys are sufficient information to determine the sorted order of the index keys correctly. While the actual compression ratio may vary depending on the characteristics of datasets (an average of 2.76 to one compression ratio was observed in our experiments), the index key compression scheme leads to significant performance improvements during the reconstruction of large-scale indexes. Our index key compression can be effectively used in database replication and index recovery of modern main-memory database systems.

[1]  David Thomas,et al.  The Art in Computer Programming , 2001 .

[2]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[3]  Kenneth A. Ross,et al.  Making B+- trees cache conscious in main memory , 2000, SIGMOD '00.

[4]  Gennady Antoshenkov,et al.  Dictionary-based order-preserving string compression , 1997, The VLDB Journal.

[5]  Michael Stonebraker,et al.  Rethinking main memory OLTP recovery , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[6]  Timothy J. Purcell Sorting and searching , 2005, SIGGRAPH Courses.

[7]  Johannes Gehrke,et al.  Query optimization in compressed database systems , 2001, SIGMOD '01.

[8]  Gary Valentin,et al.  Fractal prefetching B+-Trees: optimizing both cache and disk performance , 2002, SIGMOD '02.

[9]  Toshio Nakatani,et al.  AA-Sort: A New Parallel Sorting Algorithm for Multi-Core SIMD Processors , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[10]  Michael J. Carey,et al.  A Study of Index Structures for a Main Memory Database Management System , 1986, HPTS.

[11]  Todd C. Mowry,et al.  Improving index performance through prefetching , 2001, SIGMOD '01.

[12]  Rhys Francis,et al.  A Fast, Simple Algorithm to Balance a Parallel Multiway Merge , 1993, PARLE.

[13]  Wolfgang Lehner,et al.  k-ary search on modern processors , 2009, DaMoN '09.

[14]  Rajeev Rastogi,et al.  Main-memory index structures with fixed-size partial keys , 2001, SIGMOD '01.

[15]  Kenneth A. Ross,et al.  Cache Conscious Indexing for Decision-Support in Main Memory , 1999, VLDB.

[16]  Goetz Graefe,et al.  B-tree indexes and CPU caches , 2001, Proceedings 17th International Conference on Data Engineering.

[17]  Angelika Bayer,et al.  A First Course In Probability , 2016 .

[18]  Wolfgang Lehner,et al.  Efficient In-Memory Indexing with Generalized Prefix Trees , 2011, BTW.

[19]  Y. Mukaigawa,et al.  Large Deviations Estimates for Some Non-local Equations I. Fast Decaying Kernels and Explicit Bounds , 2022 .

[20]  Eva Zangerle,et al.  HOT: A Height Optimized Trie Index for Main-Memory Database Systems , 2018, SIGMOD Conference.

[21]  David E. Ferguson Bit-Tree: a data structure for fast file processing , 1992, CACM.

[22]  Meikel Pöss,et al.  New TPC benchmarks for decision support and web commerce , 2000, SGMD.

[23]  Rudolf Bayer,et al.  Prefix B-trees , 1977, TODS.

[24]  Pradeep Dubey,et al.  Efficient implementation of sorting on multi-core SIMD CPU architecture , 2008, Proc. VLDB Endow..

[25]  Jonathan Schaeffer,et al.  Parallel Sorting by Regular Sampling , 1992, J. Parallel Distributed Comput..

[26]  Carsten Binnig,et al.  Dictionary-based order-preserving string compression for main memory column stores , 2009, SIGMOD Conference.

[27]  Viktor Leis,et al.  SuRF: Practical Range Query Filtering with Fast Succinct Tries , 2018, SIGMOD Conference.

[28]  Alfons Kemper,et al.  Main Memory Database Systems , 2017, Found. Trends Databases.

[29]  Peter J. Varman,et al.  Merging Multiple Lists on Hierarchical-Memory Multiprocessors , 1991, J. Parallel Distributed Comput..

[30]  Makoto Onizuka,et al.  VAST-Tree: a vector-advanced and compressed structure for massive data tree traversal , 2012, EDBT '12.

[31]  Jon Louis Bentley,et al.  Engineering a sort function , 1993, Softw. Pract. Exp..

[32]  Richard E. Ladner,et al.  The influence of caches on the performance of sorting , 1997, SODA '97.

[33]  Kimberly Keeton,et al.  Order-Preserving Key Compression for In-Memory Search Trees , 2020, SIGMOD Conference.

[34]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[35]  Viktor Leis,et al.  The adaptive radix tree: ARTful indexing for main-memory databases , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[36]  G. Antoshenkov,et al.  Order preserving string compression , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[37]  Pradeep Dubey,et al.  Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort , 2010, SIGMOD Conference.

[38]  Pradeep Dubey,et al.  FAST: fast architecture sensitive tree search on modern CPUs and GPUs , 2010, SIGMOD Conference.

[39]  Craig Freedman,et al.  Hekaton: SQL server's memory-optimized OLTP engine , 2013, SIGMOD '13.

[40]  Donald E. Knuth,et al.  The Art of Computer Programming: Volume 3: Sorting and Searching , 1998 .

[41]  Harumi A. Kuno,et al.  Modern B-tree techniques , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[42]  Lin Ma,et al.  Reducing the Storage Overhead of Main-Memory OLTP Databases with Hybrid Indexes , 2016, SIGMOD Conference.

[43]  Wolfgang Lehner,et al.  KISS-Tree: smart latch-free in-memory indexing on modern architectures , 2012, DaMoN '12.

[44]  Sudipta Sengupta,et al.  The Bw-Tree: A B-tree for new hardware platforms , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[45]  Abraham Silberschatz,et al.  Database Systems Concepts , 1997 .