论文信息 - Keep Your Host Language Object and Also Query it: A Case for SQL Query Support in RDBMS for Host Language Objects

Keep Your Host Language Object and Also Query it: A Case for SQL Query Support in RDBMS for Host Language Objects

As a result of prolific growth in data science and machine learning applications, modern relational database management systems (RDBMS) are experimenting with various approaches to facilitate advanced analytical computations, in addition to the relational operations that they traditionally support. The most common approach has been to integrate an embedded high level language (HLL) interpreter into the RDBMS along with any additional libraries that specialize in numerical computations. Such implementations, e.g., user defined functions (UDFs), follow generally a black-box setup, and for many complex workflows that require datasets to be passed and processed back-and-forth between the query execution engine and the embedded HLL interpreter, optimization opportunities are not fully explored yet. In this paper, we propose and implement the concept of virtual tables that can be used to expose data set objects maintained by the embedded HLL interpreter to the query engine for executing relational operations. Unlike prevalent solutions, our approach minimizes the need for performing data copies and conversions, performing them lazily when required. It also facilitates better optimization opportunities for the execution of SQL queries as the RDBMS is able to analyze the data characteristics of the HLL objects before producing an execution plan. The approach is also programmer friendly, allowing for a more intuitive implementation of computational workflows. We perform evaluations over a variety of workloads which demonstrate the performance and programming benefits of virtual tables.

Bettina Kemme | Joseph Vinish D'silva | Florestan De Moor

[1] Peter Dadam,et al. Design and Implementation of an Extensible Database Management System Supporting User Defined Data Types and Functions , 1988, VLDB.

[2] Mihai Varga. Just-in-time compilation in MonetDB with Weld , 2018 .

[3] Jun Yang,et al. Data Management in Machine Learning: Challenges, Techniques, and Systems , 2017, SIGMOD Conference.

[4] Wolfgang Lehner,et al. Bridging two worlds with RICE , 2011, Proc. VLDB Endow..

[5] Klemens Böhm,et al. In-database analytics with ibmdbpy , 2018, SSDBM.

[6] Gaël Varoquaux,et al. The NumPy Array: A Structure for Efficient Numerical Computation , 2011, Computing in Science & Engineering.

[7] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[8] Bettina Kemme,et al. AIDA - Abstraction for Advanced In-Database Analytics , 2018, Proc. VLDB Endow..

[9] Stefan Manegold,et al. Deep Integration of Machine Learning Into Column Stores , 2018, EDBT.

[10] Michael J. Franklin,et al. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[11] Philip A. Bernstein,et al. Compiling mappings to bridge applications and databases , 2007, SIGMOD '07.