Leveraging Page-Level Compression in MySQL - A Practice at Baidu

Facing large scale of data sets, disk I/O seems still one of the bottlenecks in DBMS. In the mean time, the CPU resource is not fully utilized. So compression is introduced to take use of the computing resource and largely reduces the storage overhead. Also, the commonly used compression algorithm can improve the performance when the database runs on HDD. With SSD, however, the performance for both read and write could be negatively affected by the slow process of compression and decompression. By quantitatively analyzing the impact of compression, we proposed a balanced compression solution on SSD, in which the read performance is accelerated by using a compression algorithm (lz4hc) with an extreme high decompression speed and an asynchronous compression mechanism is introduced to reduce the write latency by moving compression to the background. We test the performance on the real data set collected from the online database systems in Baidu. The results show the read performance on SSD is improved by 25% compared to the uncompressed database and 36% compared with commonly used zlib compression. Meanwhile, the write performance is up to 20% and 33% better than the synchronous compression on lz4hc and zlib.

[1]  Daniel J. Abadi,et al.  Integrating compression and execution in column-oriented database systems , 2006, SIGMOD Conference.

[2]  Wolfgang Lehner,et al.  How to juggle columns: an entropy-based approach for table compression , 2010, IDEAS '10.

[3]  Marcin Zukowski,et al.  Super-Scalar RAM-CPU Cache Compression , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[4]  Sven Helmer,et al.  The implementation and performance of compressed databases , 2000, SGMD.

[5]  Ingo Müller,et al.  Adaptive String Dictionary Compression in In-Memory Column-Store Database Systems , 2014, EDBT.

[6]  Kenneth A. Ross,et al.  Efficient Index Compression in DB2 LUW , 2009, Proc. VLDB Endow..

[7]  S. Aghav Database compression techniques for performance optimization , 2010, 2010 2nd International Conference on Computer Engineering and Technology.

[8]  Michael Stonebraker,et al.  C-Store: A Column-oriented DBMS , 2005, VLDB.

[9]  Gordon V. Cormack,et al.  Data compression on a database system , 1985, CACM.

[10]  Raghav Kaushik,et al.  Estimating the compression fraction of an index using sampling , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[11]  Meikel Pöss,et al.  Data Compression in Oracle , 2003, VLDB.

[12]  Balakrishna R. Iyer,et al.  Data Compression Support in Databases , 1994, VLDB.

[13]  Ki-Hoon Lee Performance Improvement of Database Compression for OLTP Workloads , 2014, IEICE Trans. Inf. Syst..

[14]  Daniel J. Abadi,et al.  Query execution in column-oriented database systems , 2008 .