Optimizing Write Performance for Read Optimized Databases

Compression in column-oriented databases has been proven to offer both performance enhancements and reductions in storage consumption. This is especially true for read access as compressed data can directly be processed for query execution.Nevertheless, compression happens to be disadvantageous when it comes to write access due to unavoidable re-compression: write-access requires significantly more data to be read than involved in the particular operation, more tuples may have to be modified depending on the compression algorithm, and table-level locks have to be acquired instead of row-level locks as long as no second version of the data is stored. As an effect the duration of a single modification — both insert and update — limits both throughput and response time significantly. In this paper, we propose to use an additional write-optimized buffer to maintain the delta that in conjunction with the compressed main store represents the current state of the data. This buffer facilitates an uncompressed, column-oriented data structure. To address the mentioned disadvantages of data compression, we trade write-performance for query-performance and memory consumption by using the buffer as an intermediate storage for several modifications which are then populated as a bulk in a merge operation. Hereby, the overhead created by one single re-compression is shared among all recent modifications. We evaluated our implementation inside SAP’s in memory column store. We then analyze the different parameters influencing the merge process, and make a complexity analysis. Finally, we show optimizations regarding resource consumption and merge duration.

[1]  Michael Stonebraker,et al.  C-Store: A Column-oriented DBMS , 2005, VLDB.

[2]  Marcin Zukowski,et al.  MonetDB/X100: Hyper-Pipelining Query Execution , 2005, CIDR.

[3]  Daniel J. Abadi,et al.  Integrating compression and execution in column-oriented database systems , 2006, SIGMOD Conference.

[4]  David J. DeWitt,et al.  Read-optimized databases, in depth , 2008, Proc. VLDB Endow..

[5]  Miron Livny,et al.  Multiclass Query Scheduling in Real-Time Database Systems , 1995, IEEE Trans. Knowl. Data Eng..

[6]  Robert L. Rappaport File structure design to facilitate on-line instantaneous updating , 1975, SIGMOD '75.

[7]  Daniel J. Abadi,et al.  Performance tradeoffs in read-optimized databases , 2006, VLDB.

[8]  Hasso Plattner,et al.  A common database approach for OLTP and OLAP using an in-memory column database , 2009, SIGMOD Conference.

[9]  Sven Helmer,et al.  The implementation and performance of compressed databases , 2000, SGMD.

[10]  F. Tödtling,et al.  One size fits all?: Towards a differentiated regional innovation policy approach , 2005 .

[11]  Johannes Gehrke,et al.  Query optimization in compressed database systems , 2001, SIGMOD '01.

[12]  Alexander Zeier,et al.  SIMD-Scan: Ultra Fast in-Memory Table Scan using on-Chip Vector Processing Units , 2009, Proc. VLDB Endow..

[13]  Martin L. Kersten,et al.  Database Architecture Optimized for the New Bottleneck: Memory Access , 1999, VLDB.

[14]  David J. DeWitt,et al.  A case for fractured mirrors , 2003, The VLDB Journal.

[15]  Setrag Khoshafian,et al.  A decomposition storage model , 1985, SIGMOD Conference.

[16]  Clark D. French,et al.  “One size fits all” database architectures do not work for DSS , 1995, SIGMOD '95.

[17]  Daniel J. Abadi,et al.  Query execution in column-oriented database systems , 2008 .

[18]  Miron Livny,et al.  Towards Automated Performance Tuning for Complex Workloads , 1994, VLDB.

[19]  Guy M. Lohman,et al.  Differential files: their application to the maintenance of large databases , 1976, TODS.

[20]  Kenneth A. Ross,et al.  Making B+- trees cache conscious in main memory , 2000, SIGMOD '00.

[21]  Wolfgang Lehner,et al.  Data mining with the SAP NetWeaver BI accelerator , 2006, VLDB.

[22]  Clark D. French Teaching an OLTP database kernel advanced datawarehousing techniques , 1997, Proceedings 13th International Conference on Data Engineering.