Cache-conscious buffering for database operators with state

Database processes must be cache-efficient to effectively utilize modern hardware. In this paper, we analyze the importance of temporal locality and the resultant cache behavior in scheduling database operators for in-memory, block oriented query processing. We demonstrate how the overall performance of a workload of multiple database operators is strongly dependent on how they are interleaved with each other. Longer time slices combined with temporal locality within an operator amortize the effects of the initial compulsory cache misses needed to load the operator's state, such as a hash table, into the cache. Though running an operator to completion over all of its input results in the greatest amortization of cache misses, this is typically infeasible because of the large intermediate storage requirement to materialize all input tuples to an operator. We show experimentally that good cache performance can be obtained with smaller buffers whose size is determined at runtime. We demonstrate a low-overhead method of runtime cache miss sampling using hardware performance counters. Our evaluation considers two common database operators with state: aggregation and hash join. Sampling reveals operator temporal locality and cache miss behavior, and we use those characteristics to choose an appropriate input buffer/block size. The calculated buffer size balances cache miss amortization with buffer memory requirements.

[1]  Kenneth A. Ross,et al.  Cache Conscious Indexing for Decision-Support in Main Memory , 1999, VLDB.

[2]  Kenneth A. Ross,et al.  Adaptive Aggregation on Chip Multiprocessors , 2007, VLDB.

[3]  Kenneth Baclawski,et al.  Quickly generating billion-record synthetic databases , 1994, SIGMOD '94.

[4]  Martin L. Kersten,et al.  Optimizing Main-Memory Join on Modern Hardware , 2002, IEEE Trans. Knowl. Data Eng..

[5]  Srinivasan Parthasarathy,et al.  Cache-conscious Frequent Pattern Mining on a Modern Processor , 2005, VLDB.

[6]  Ramesh C. Agarwal,et al.  Block oriented processing of relational database operations in modern computer architectures , 2001, Proceedings 17th International Conference on Data Engineering.

[7]  Kenneth A. Ross,et al.  Buffering databse operations for enhanced instruction cache performance , 2004, SIGMOD '04.

[8]  Stefan Manegold,et al.  Cache-Conscious Radix-Decluster Projections , 2004, VLDB.

[9]  David J. DeWitt,et al.  DBMSs on a Modern Processor: Where Does Time Go? , 1999, VLDB.

[10]  Kenneth A. Ross,et al.  Improving Database Performance on Simultaneous Multithreading Processors , 2005, VLDB.

[11]  Roger MacNicol,et al.  Sybase IQ Multiplex - Designed For Analytics , 2004, VLDB.

[12]  Abraham Silberschatz,et al.  Operating System Concepts , 1983 .

[13]  Anastasia Ailamaki,et al.  Improving instruction cache performance in OLTP , 2006, TODS.

[14]  David R. Karger,et al.  Scheduling Algorithms , 2004, Algorithms and Theory of Computation Handbook.

[15]  Martin L. Kersten,et al.  MIL primitives for querying a fragmented world , 1999, The VLDB Journal.

[16]  Kenneth A. Ross,et al.  Parallel buffers for chip multiprocessors , 2007, DaMoN '07.

[17]  David J. DeWitt,et al.  Memory allocation strategies for complex decision support queries , 1998, CIKM '98.

[18]  Michael Stonebraker,et al.  C-Store: A Column-oriented DBMS , 2005, VLDB.

[19]  Kenneth A. Ross,et al.  Buffering Accesses to Memory-Resident Index Structures , 2003, VLDB.

[20]  Eric Li,et al.  Optimization of Frequent Itemset Mining on Multiple-Core Processor , 2007, VLDB.

[21]  Kenneth A. Ross,et al.  Making B+- trees cache conscious in main memory , 2000, SIGMOD '00.