Column-based RLE in row-oriented database

In database systems, disk I/O performance is usually the bottleneck of the whole query processing. Among many techniques, compression is one of the most important ones to reduce disk accesses so to improve system performance. RLE (Run-Length Encoding) is one light-weight compression algorithm which incurs negligible CPU cost. A lot of work show that, although RLE is one of the most effective compression techniques in column-oriented systems, it is very hard to use due to bad value locality in row-oriented systems where values from multiple attributes are stored in the same page. We propose CRLE (Column-based RLE), one compression algorithm to apply RLE to row-oriented data storage. On row-oriented storage page, CRLE can exploit value locality in individual column and encode values from the same column in run-length format. Experiments show that CRLE can lead to very good compression ratio and performance in spite of row-oriented data storage.

[1]  Michael Stonebraker,et al.  C-Store: A Column-oriented DBMS , 2005, VLDB.

[2]  Daniel J. Abadi,et al.  Performance tradeoffs in read-optimized databases , 2006, VLDB.

[3]  Jayant R. Haritsa,et al.  Database Compression: A Performance Enhancement Tool , 1995, COMAD.

[4]  David A. Huffman,et al.  A method for the construction of minimum-redundancy codes , 1952, Proceedings of the IRE.

[5]  Daniel J. Abadi,et al.  Integrating compression and execution in column-oriented database systems , 2006, SIGMOD Conference.

[6]  David J. DeWitt,et al.  Read-optimized databases, in depth , 2008, Proc. VLDB Endow..

[7]  Goetz Graefe,et al.  Data compression and database performance , 1991, [Proceedings] 1991 Symposium on Applied Computing.

[8]  Daniel J. Abadi,et al.  Column-stores vs. row-stores: how different are they really? , 2008, SIGMOD Conference.

[9]  Meikel Pöss,et al.  Data Compression in Oracle , 2003, VLDB.

[10]  David J. DeWitt,et al.  How to barter bits for chronons: compression and bandwidth trade offs for database scans , 2007, SIGMOD '07.

[11]  Garret Swart,et al.  How to wring a table dry: entropy compression of relations and querying of compressed relations , 2006, VLDB.

[12]  Balakrishna R. Iyer,et al.  Data Compression Support in Databases , 1994, VLDB.

[13]  Gordon V. Cormack,et al.  Data compression on a database system , 1985, CACM.

[14]  Jonathan Goldstein,et al.  Compressing relations and indexes , 1998, Proceedings 14th International Conference on Data Engineering.

[15]  Elizabeth O'Neil,et al.  Database--Principles, Programming, and Performance , 1994 .