Fast & Strong: The Case of Compressed String Dictionaries on Modern CPUs

String dictionaries constitute a large portion of the memory foot-print of database applications. While strong string dictionary compression algorithms exist, these come with impractical access and compression times. Therefore, lightweight algorithms such as front coding are favored in practice. This paper endeavors to make strong string dictionary compression practical. We focus on Re-Pair Front Coding (RPFC), a grammar-based compression algorithm, since it consistently offers better compression ratios than other algorithms in the literature. To accelerate compression times, we propose block-based RPFC, which consists in compressing independently small blocks of the dictionary. Moreover, to accelerate access times, we devise a vectorized access method, using Intel® Advanced Vector Extensions 512 (Intel® AVX-512), that is enabled by two specific changes we propose to RPFC. Our experimental evaluation shows that our proposed techniques accelerate compression and access times by up to 24x and 2.9x, respectively. These results move our modified RPFC into a practical range for use in database systems.

[1]  Masao Fuketa,et al.  Practical String Dictionary Compression Using String Dictionary Encoding , 2017, 2017 International Conference on Big Data Innovations and Applications (Innovate-Data).

[2]  Nieves R. Brisaboa,et al.  Compressed String Dictionaries , 2011, SEA.

[3]  Sven Helmer,et al.  The implementation and performance of compressed databases , 2000, SGMD.

[4]  David Richard Clark,et al.  Compact pat trees , 1998 .

[5]  Ingo Müller,et al.  Adaptive String Dictionary Compression in In-Memory Column-Store Database Systems , 2014, EDBT.

[6]  Alexander Zeier,et al.  Speeding Up Queries in Column Stores - A Case for Compression , 2010, DaWak.

[7]  Norman May,et al.  The SAP HANA Database -- An Architecture Overview , 2012, IEEE Data Eng. Bull..

[8]  Sebastiano Vigna,et al.  UbiCrawler: a scalable fully distributed Web crawler , 2004, Softw. Pract. Exp..

[9]  Ahmad Yasin,et al.  A Top-Down method for performance analysis and counters architecture , 2014, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[10]  Giuseppe Ottaviano,et al.  Fast Compressed Tries through Path Decompositions , 2011, ALENEX.

[11]  Daniel J. Abadi,et al.  Integrating compression and execution in column-oriented database systems , 2006, SIGMOD Conference.

[12]  Alistair Moffat,et al.  Off-line dictionary-based compression , 1999, Proceedings of the IEEE.

[13]  Alexander Zeier,et al.  SIMD-Scan: Ultra Fast in-Memory Table Scan using on-Chip Vector Processing Units , 2009, Proc. VLDB Endow..

[14]  Johannes Fischer,et al.  LZ-Compressed String Dictionaries , 2014, 2014 Data Compression Conference.

[15]  Nieves R. Brisaboa,et al.  Practical compressed string dictionaries , 2016, Inf. Syst..

[16]  Ismail Oukid,et al.  Vectorizing Database Column Scans with Complex Predicates , 2013, ADMS@VLDB.