Juggling Functions Inside a Database

We define and study the Functional Aggregate Query (FAQ) problem, which captures common computational tasks across a very wide range of domains including relational databases, logic, matrix and tensor computation, probabilistic graphical models, constraint satisfaction, and signal processing. Simply put, an FAQ is a declarative way of defining a new function from a database of input functions. We present InsideOut, a dynamic programming algorithm, to evaluate an FAQ. The algorithm rewrites the input query into a set of easier-to-compute FAQ sub-queries. Each subquery is then evaluated using a worst-case optimal relational join algorithm. The topic of designing algorithms to optimally evaluate the classic multiway join problem has seen exciting developments in the past few years. Our framework tightly connects these new ideas in database theory with a vast number of application areas in a coherent manner, showing potentially that -- with the right abstraction, blurring the distinction between data and computation -- a good database engine can be a general purpose constraint solver, relational data store, graphical model inference engine, and matrix/tensor computation processor all at once. The InsideOut algorithm is very simple, as shall be described in this paper. Yet, in spite of solving an extremely general problem, its runtime either is as good as or improves upon the best known algorithm for the applications that FAQ specializes to. These corollaries include computational tasks in graphical model inference, matrix/tensor operations, relational joins, and logic. Better yet, InsideOut can be used within any database engine, because it is basically a principled way of rewriting queries. Indeed, it is already part of the LogicBlox database engine, helping efficiently answer traditional database queries, graphical model inference queries, and train a large class of machine learning models inside the database itself.

[1]  Atri Rudra,et al.  Skew strikes back: new developments in the theory of join algorithms , 2013, SGMD.

[2]  Martin Grohe,et al.  Constraint solving via fractional edge covers , 2006, SODA 2006.

[3]  Benny Kimelfeld,et al.  Flexible Caching in Trie Joins , 2016, EDBT.

[4]  Nir Friedman,et al.  Probabilistic Graphical Models , 2009, Data-Driven Computational Neuroscience.

[5]  Javier Larrosa,et al.  Unifying tree decompositions for reasoning in graphical models , 2005, Artif. Intell..

[6]  Sara Cohen,et al.  User-defined aggregate functions: bridging theory and practice , 2006, SIGMOD Conference.

[7]  Atri Rudra,et al.  Join Processing for Graph Patterns: An Old Dog with New Tricks , 2015, GRADES@SIGMOD/PODS.

[8]  Georg Gottlob,et al.  General and Fractional Hypertree Decompositions: Hard and Easy Cases , 2016, AMW.

[9]  Dan Suciu,et al.  From Theory to Practice: Efficient Join Query Evaluation in a Parallel Database System , 2015, SIGMOD Conference.

[10]  Toby Walsh,et al.  Handbook of Constraint Programming (Foundations of Artificial Intelligence) , 2006 .

[11]  Donald W. Loveland,et al.  A machine program for theorem-proving , 2011, CACM.

[12]  Jakub Závodný,et al.  Aggregation and Ordering in Factorised Databases , 2013, Proc. VLDB Endow..

[13]  Hung Q. Ngo,et al.  In-Database Learning with Sparse Tensors , 2017, PODS.

[14]  Dániel Marx,et al.  Approximating fractional hypertree width , 2009, TALG.

[15]  Jakub Závodný,et al.  Size Bounds for Factorised Representations of Query Results , 2015, TODS.

[16]  Rina Dechter,et al.  Bucket Elimination: A Unifying Framework for Reasoning , 1999, Artif. Intell..

[17]  Dan Suciu,et al.  What Do Shannon-type Inequalities, Submodular Width, and Disjunctive Datalog Have to Do with One Another? , 2016, PODS.

[18]  Hubie Chen,et al.  Decomposing Quantified Conjunctive (or Disjunctive) Formulas , 2016, SIAM J. Comput..

[19]  J. Tukey,et al.  An algorithm for the machine calculation of complex Fourier series , 1965 .

[20]  Nevin Lianwen Zhang,et al.  Exploiting Causal Independence in Bayesian Network Inference , 1996, J. Artif. Intell. Res..

[21]  Emir Pasalic,et al.  Design and Implementation of the LogicBlox System , 2015, SIGMOD Conference.

[22]  Robert J. McEliece,et al.  The generalized distributive law , 2000, IEEE Trans. Inf. Theory.

[23]  Dan Suciu,et al.  Computing Join Queries with Functional Dependencies , 2016, PODS.

[24]  Dan Olteanu,et al.  F: Regression Models over Factorized Views , 2016, Proc. VLDB Endow..

[25]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[26]  Todd L. Veldhuizen,et al.  Leapfrog Triejoin: A Simple, Worst-Case Optimal Join Algorithm , 2012, 1210.0481.

[27]  Atri Rudra,et al.  FAQ: Questions Asked Frequently , 2015, PODS.

[28]  Jeffrey F. Naughton,et al.  Learning Generalized Linear Models Over Normalized Data , 2015, SIGMOD Conference.

[29]  Nevin L. Zhang,et al.  A simple approach to Bayesian network computations , 1994 .

[30]  Toby Walsh,et al.  Handbook of Constraint Programming , 2006, Handbook of Constraint Programming.

[31]  Balder ten Cate,et al.  Declarative Probabilistic Programming with Datalog , 2016, ICDT.

[32]  Solomon W. Golomb,et al.  Backtrack Programming , 1965, JACM.

[33]  Dániel Marx,et al.  Size Bounds and Query Plans for Relational Joins , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[34]  References , 1971 .

[35]  Arnaud Durand,et al.  Structural Tractability of Counting of Solutions to Conjunctive Queries , 2013, ICDT '13.

[36]  Dan Olteanu,et al.  Learning Linear Regression Models over Factorized Joins , 2016, SIGMOD Conference.