Efficiently Compiling Efficient Query Plans for Modern Hardware

As main memory grows, query performance is more and more determined by the raw CPU costs of query processing itself. The classical iterator style query processing technique is very simple and exible, but shows poor performance on modern CPUs due to lack of locality and frequent instruction mispredictions. Several techniques like batch oriented processing or vectorized tuple processing have been proposed in the past to improve this situation, but even these techniques are frequently out-performed by hand-written execution plans. In this work we present a novel compilation strategy that translates a query into compact and efficient machine code using the LLVM compiler framework. By aiming at good code and data locality and predictable branch layout the resulting code frequently rivals the performance of hand-written C++ code. We integrated these techniques into the HyPer main memory database system and show that this results in excellent query performance while requiring only modest compilation time.

[1]  Raymond A. Lorie,et al.  XRM - An Extended (N-ary) Relational Memory , 1974, Research Report / G / IBM / Cambridge Scientific Center.

[2]  Goetz Graefe,et al.  The Volcano optimizer generator: extensibility and efficient search , 1993, Proceedings of IEEE 9th International Conference on Data Engineering.

[3]  David J. DeWitt,et al.  Managing Intra-operator Parallelism in Parallel Database Systems , 1995, VLDB.

[4]  Martin L. Kersten,et al.  Optimizing database architecture for the new bottleneck: memory access , 2000, The VLDB Journal.

[5]  Ramesh C. Agarwal,et al.  Block oriented processing of relational database operations in modern computer architectures , 2001, Proceedings 17th International Conference on Data Engineering.

[6]  Kenneth A. Ross,et al.  Conjunctive selection conditions in main memory , 2002, PODS.

[7]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[8]  Marcin Zukowski,et al.  MonetDB/X100: Hyper-Pipelining Query Execution , 2005, CIDR.

[9]  Marcin Zukowski,et al.  MonetDB/X100 - A DBMS In The CPU Cache , 2005, IEEE Data Eng. Bull..

[10]  Hamid Pirahesh,et al.  Compiled Query Execution Engine using JVM , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[11]  Kenneth A. Ross,et al.  Adaptive Aggregation on Chip Multiprocessors , 2007, VLDB.

[12]  Frederick Reiss,et al.  Constant-Time Query Processing , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[13]  Martin L. Kersten,et al.  Database Architecture Evolution: Mammals Flourished long before Dinosaurs became Extinct , 2009, Proc. VLDB Endow..

[14]  Alexander Zeier,et al.  SIMD-Scan: Ultra Fast in-Memory Table Scan using on-Chip Vector Processing Units , 2009, Proc. VLDB Endow..

[15]  Stratis Viglas,et al.  Generating code for holistic query evaluation , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[16]  Alfons Kemper,et al.  HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots , 2011, 2011 IEEE 27th International Conference on Data Engineering.