Just-In-Time (JIT) compilation increasingly becomes a key technology for modern database systems. It allows the creation of code on-the-fly to perfectly match an active query. In the past, it has been argued that a query should be compiled to a single loop that performs all query actions, for example, all selects over all relevant columns. On the other hand, vectorization -- a common feature in modern data systems -- allows for better results by evaluating the query predicates sequentially in different tight for-loops. In this paper, we study JIT compilation for modern in-memory column-stores in detail and we show that, contrary to the common belief that vectorization outweighs the benefits of having one loop, there are cases in which creating a single loop is actually the optimal solution. In fact, deciding between multiple or a single loop is not a static decision; instead, it depends on (per column) query selectivity. We perform our experiments on a modern column-store prototype that supports vectorization and we show that, depending on selectivity, a different code layout is optimal. When a select operator is implemented with a no-branch design, for low selectivity creating multiple loops performs better than a single loop. A single tight loop performs better otherwise.
[1]
Stratis Viglas.
Just-in-time compilation for SQL query processing
,
2014,
2014 IEEE 30th International Conference on Data Engineering.
[2]
Thomas Neumann,et al.
Efficiently Compiling Efficient Query Plans for Modern Hardware
,
2011,
Proc. VLDB Endow..
[3]
Marcin Zukowski,et al.
Vectorization vs. compilation in query execution
,
2011,
DaMoN '11.
[4]
Kenneth A. Ross,et al.
Selection conditions in main memory
,
2004,
TODS.
[5]
Martin L. Kersten,et al.
Database Cracking
,
2007,
CIDR.