One Loop Does Not Fit All

Just-In-Time (JIT) compilation increasingly becomes a key technology for modern database systems. It allows the creation of code on-the-fly to perfectly match an active query. In the past, it has been argued that a query should be compiled to a single loop that performs all query actions, for example, all selects over all relevant columns. On the other hand, vectorization -- a common feature in modern data systems -- allows for better results by evaluating the query predicates sequentially in different tight for-loops. In this paper, we study JIT compilation for modern in-memory column-stores in detail and we show that, contrary to the common belief that vectorization outweighs the benefits of having one loop, there are cases in which creating a single loop is actually the optimal solution. In fact, deciding between multiple or a single loop is not a static decision; instead, it depends on (per column) query selectivity. We perform our experiments on a modern column-store prototype that supports vectorization and we show that, depending on selectivity, a different code layout is optimal. When a select operator is implemented with a no-branch design, for low selectivity creating multiple loops performs better than a single loop. A single tight loop performs better otherwise.

[1]  Stratis Viglas Just-in-time compilation for SQL query processing , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[2]  Thomas Neumann,et al.  Efficiently Compiling Efficient Query Plans for Modern Hardware , 2011, Proc. VLDB Endow..

[3]  Marcin Zukowski,et al.  Vectorization vs. compilation in query execution , 2011, DaMoN '11.

[4]  Kenneth A. Ross,et al.  Selection conditions in main memory , 2004, TODS.

[5]  Martin L. Kersten,et al.  Database Cracking , 2007, CIDR.