Efficient Execution of Multiple Queries on Deep Memory Hierarchy

This paper proposes a complementary novel idea, called MiniTasking to further reduce the number of cache misses by improving the data temporal locality for multiple concurrent queries. Our idea is based on the observation that, in many workloads such as decision support systems (DSS), there is usually significant amount of data sharing among different concurrent queries. MiniTasking exploits such data sharing to improve data temporal locality by scheduling query execution at three levels: query level batching, operator level grouping and mini-task level scheduling. The experimental results with various types of concurrent TPC-H query workloads show that, with the traditional N-ary Storage Model (NSM) layout, MiniTasking significantly reduces the L2 cache misses by up to 83, and thereby achieves 24% reduction in execution time. With the Partition Attributes Across (PAX) layout, MiniTasking further reduces the cache misses by 65% and the execution time by 9%. For the TPC-H throughput test workload, MiniTasking improves the end performance up to 20%.

[1]  Martin L. Kersten,et al.  Database Architecture Optimized for the New Bottleneck: Memory Access , 1999, VLDB.

[2]  Chau-Wen Tseng,et al.  Compiler optimizations for improving data locality , 1994, ASPLOS VI.

[3]  Latha S. Colby,et al.  Redbrick Vista: Aggregate Computation and Management , 1998, ICDE 1998.

[4]  Yuanyuan Zhou,et al.  Thread scheduling for out-of-core applications with memory server on multicomputers , 1999, IOPADS '99.

[5]  Timos K. Sellis,et al.  On the Multiple-Query Optimization Problem , 1990, IEEE Trans. Knowl. Data Eng..

[6]  Kenneth A. Ross,et al.  Making B+- trees cache conscious in main memory , 2000, SIGMOD '00.

[7]  David J. DeWitt,et al.  Data page layouts for relational databases on deep memory hierarchies , 2002, The VLDB Journal.

[8]  Jignesh M. Patel,et al.  Data Morphing: An Adaptive, Cache-Conscious Storage Technique , 2003, VLDB.

[9]  Arie Segev,et al.  Using common subexpressions to optimize multiple queries , 1988, Proceedings. Fourth International Conference on Data Engineering.

[10]  Donald Yeung,et al.  Evaluating the impact of memory system performance on software prefetching and locality optimizations , 2001, ICS '01.

[11]  David J. DeWitt,et al.  DBMSs on a Modern Processor: Where Does Time Go? , 1999, VLDB.

[12]  Chandra Krintz,et al.  Cache-conscious data placement , 1998, ASPLOS VIII.

[13]  David J. DeWitt,et al.  Shoring up persistent applications , 1994, SIGMOD '94.

[14]  Josep Torrellas,et al.  The memory performance of DSS commercial workloads in shared-memory multiprocessors , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.

[15]  Sheldon J. Finkelstein Common expression analysis in database applications , 1982, SIGMOD '82.

[16]  S. Sudarshan,et al.  Query scheduling in multi query optimization , 2001, Proceedings 2001 International Database Engineering and Applications Symposium.

[17]  Zhiyuan Li,et al.  New tiling techniques to improve cache temporal locality , 1999, PLDI '99.

[18]  Kihong Kim,et al.  Optimizing multidimensional index trees for main memory access , 2001, SIGMOD '01.

[19]  Gary Valentin,et al.  Fractal prefetching B+-Trees: optimizing both cache and disk performance , 2002, SIGMOD '02.

[20]  David J. DeWitt,et al.  A case for fractured mirrors , 2003, The VLDB Journal.

[21]  Kai Li,et al.  Thread scheduling for cache locality , 1996, ASPLOS VII.

[22]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.

[23]  Ken Kennedy,et al.  Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution , 1993, LCPC.

[24]  Hiroshi Nakamura,et al.  Augmenting Loop Tiling with Data Alignment for Improved Cache Performance , 1999, IEEE Trans. Computers.

[25]  Jeffrey F. Naughton,et al.  Cache Conscious Algorithms for Relational Query Processing , 1994, VLDB.

[26]  Susan J. Eggers,et al.  An analysis of database workload performance on simultaneous multithreaded processors , 1998, ISCA.

[27]  Jeffrey F. Naughton,et al.  Simultaneous optimization and evaluation of multiple dimensional queries , 1998, SIGMOD '98.

[28]  Todd C. Mowry,et al.  Improving index performance through prefetching , 2001, SIGMOD '01.

[29]  Keith D. Cooper,et al.  Compiler-controlled memory , 1998, ASPLOS VIII.

[30]  Prasan Roy,et al.  Efficient and extensible algorithms for multi query optimization , 1999, SIGMOD '00.

[31]  S. Sudarshan,et al.  Pipelining in multi-query optimization , 2001, PODS '01.

[32]  Amr El Abbadi,et al.  Multiple query optimization by cache-aware middleware using query teamwork , 2002, Proceedings 18th International Conference on Data Engineering.

[33]  Timos K. Sellis,et al.  Multiple-query optimization , 1988, TODS.

[34]  Anastasia Ailamaki,et al.  QPipe: a simultaneously pipelined relational query engine , 2005, SIGMOD '05.

[35]  Kenneth A. Ross,et al.  Buffering Accesses to Memory-Resident Index Structures , 2003, VLDB.

[36]  Chen Ding,et al.  The Potential of Computation Regrouping for Improving Locality , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[37]  Ken Kennedy,et al.  Inter-array Data Regrouping , 1999, LCPC.

[38]  Chau-Wen Tseng,et al.  Improving data locality with loop transformations , 1996, TOPL.