HorseIR: bringing array programming languages together with database query processing

Relational database management systems (RDBMS) are operationally similar to a dynamic language processor. They take SQL queries as input, dynamically generate an optimized execution plan, and then execute it. In recent decades, the emergence of in-memory databases with columnar storage, which use array-like storage structures, has shifted the focus on optimizations from the traditional I/O bottleneck to CPU and memory. However, database research so far has primarily focused on CPU cache optimizations. The similarity in the computational characteristics of such database workloads and array programming language optimizations are largely unexplored. We believe that these database implementations can benefit from merging database optimizations with dynamic array-based programming language approaches. Therefore, in this paper, we propose a novel approach to optimize database query execution using a new array-based intermediate representation, HorseIR, that resides between database queries and compiled code. Furthermore, we provide a translator to generate HorseIR from database execution plans and a compiler that optimizes HorseIR and generates efficient code. We compare HorseIR with the MonetDB RDBMS, by testing standard SQL queries, and show how our approach and compiler optimizations improve the runtime of complex queries.

[1]  Josep Torrellas,et al.  Detailed characterization of a quad Pentium Pro server running TPC-D , 1999, Proceedings 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors (Cat. No.99CB37040).

[2]  Evaggelia Pitoura Query Optimization , 2009, Encyclopedia of Database Systems.

[3]  Matthias Jarke,et al.  Query Optimization in Database Systems , 1984, CSUR.

[4]  Wai-Mee Ching,et al.  Automatic Parallelization of Array-oriented Programs for a Multi-core Machine , 2012, International Journal of Parallel Programming.

[5]  David J. DeWitt,et al.  DBMSs on a Modern Processor: Where Does Time Go? , 1999, VLDB.

[6]  Michael Stonebraker,et al.  OLTP through the looking glass, and what we found there , 2008, SIGMOD Conference.

[7]  Patrick Valduriez,et al.  Join indices , 1987, TODS.

[8]  Ken Kennedy Fast greedy weighted fusion , 2000, ICS '00.

[9]  Donald D. Chamberlin,et al.  SEQUEL: A structured English query language , 1974, SIGFIDET '74.

[10]  Laurie J. Hendren,et al.  MIX10: compiling MATLAB to X10 for high performance , 2014, OOPSLA.

[11]  Kenneth A. Ross,et al.  Buffering databse operations for enhanced instruction cache performance , 2004, SIGMOD '04.

[12]  Laurie J. Hendren,et al.  Automatic Vectorization for MATLAB , 2016, LCPC.

[13]  Ying Zhang,et al.  SciQL: array data processing inside an RDBMS , 2013, SIGMOD '13.

[14]  Marcin Zukowski,et al.  MonetDB/X100: Hyper-Pipelining Query Execution , 2005, CIDR.

[15]  F. E. A Relational Model of Data Large Shared Data Banks , 2000 .

[16]  John Miles Smith,et al.  Optimizing the performance of a relational algebra database interface , 1975, CACM.

[17]  D. L. Parnas,et al.  On the criteria to be used in decomposing systems into modules , 1972, Software Pioneers.

[18]  Martin L. Kersten,et al.  MonetDB: Two Decades of Research in Column-oriented Database Architectures , 2012, IEEE Data Eng. Bull..

[19]  Daniel J. Abadi,et al.  Column-stores vs. row-stores: how different are they really? , 2008, SIGMOD Conference.

[20]  Alfons Kemper,et al.  HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[21]  Georg Gottlob,et al.  Translating SQL Into Relational Algebra: Optimization, Semantics, and Equivalence of SQL Queries , 1985, IEEE Transactions on Software Engineering.

[22]  Thomas Neumann,et al.  Efficiently Compiling Efficient Query Plans for Modern Hardware , 2011, Proc. VLDB Endow..

[23]  Laurie J. Hendren,et al.  Efficiently implementing the copy semantics of MATLAB's arrays in JavaScript , 2016, DLS.

[24]  Keshav Pingali,et al.  A case for source-level transformations in MATLAB , 1999, DSL '99.

[25]  Ramesh C. Agarwal,et al.  Block oriented processing of relational database operations in modern computer architectures , 2001, Proceedings 17th International Conference on Data Engineering.

[26]  Hannes Mühleisen,et al.  Vectorized UDFs in Column-Stores , 2016, SSDBM.

[27]  Amir Shaikhha,et al.  How to Architect a Query Compiler , 2016, SIGMOD Conference.

[28]  Christoph Koch,et al.  DBToaster: A SQL Compiler for High-Performance Delta Processing in Main-Memory Databases , 2009, Proc. VLDB Endow..

[29]  Ken Kennedy,et al.  Fast Greedy Weighted Fusion , 2000, ICS '00.

[30]  M. A. Jenkins Q'Nial: A portable interpreter for the nested interactive array language, Nial , 1989, Softw. Pract. Exp..

[31]  Beng Chin Ooi,et al.  In-Memory Big Data Management and Processing: A Survey , 2015, IEEE Transactions on Knowledge and Data Engineering.

[32]  Christoph Koch Abstraction Without Regret in Database Systems Building: a Manifesto , 2014, IEEE Data Eng. Bull..

[33]  Yannis E. Ioannidis,et al.  Query optimization , 1996, CSUR.