Solving the Join Ordering Problem via Mixed Integer Linear Programming

We transform join ordering into a mixed integer linear program (MILP). This allows to address query optimization by mature MILP solver implementations that have evolved over decades and steadily improved their performance. They offer features such as anytime optimization and parallel search that are highly relevant for query optimization. We present a MILP formulation for searching left-deep query plans. We use sets of binary variables to represent join operands and intermediate results, operator implementation choices or the presence of interesting orders. Linear constraints restrict value assignments to the ones representing valid query plans. We approximate the cost of scan and join operations via linear functions, allowing to increase approximation precision up to arbitrary degrees. We integrated a prototypical implementation of our approach into the Postgres optimizer and compare against the original optimizer and several variants. Our experimental results are encouraging: we are able to optimize queries joining 40 tables within less than one minute of optimization time. Such query sizes are far beyond the capabilities of traditional query optimization algorithms with worst case guarantees on plan quality. Furthermore, as we use an existing solver, our optimizer implementation is small and can be integrated with low overhead.

[1]  David Maier,et al.  Rapid bushy join-order optimization with Cartesian products , 1996, SIGMOD '96.

[2]  Surajit Chaudhuri,et al.  Query optimizers: time to rethink the contract? , 2009, SIGMOD Conference.

[3]  Surajit Chaudhuri,et al.  Optimization of queries with user-defined predicates , 1996, TODS.

[4]  Murat Ali Bayir,et al.  Integer Linear Programming Solution for the Multiple Query Optimization Problem , 2014, ISCIS.

[5]  Volker Markl,et al.  Parallelizing query optimization , 2008, Proc. VLDB Endow..

[6]  Guy M. Lohman,et al.  Measuring the Complexity of Join Enumeration in Query Optimization , 1990, VLDB.

[7]  George C. Caragea,et al.  Orca: a modular query optimizer architecture for big data , 2014, SIGMOD Conference.

[8]  Patricia G. Selinger,et al.  Access path selection in a relational database management system , 1979, SIGMOD '79.

[9]  Christoph Koch,et al.  An Incremental Anytime Algorithm for Multi-Objective Query Optimization , 2015, SIGMOD Conference.

[10]  Arun N. Swami,et al.  Optimization of large join queries: combining heuristics and combinatorial techniques , 1989, SIGMOD '89.

[11]  Guido Moerkotte,et al.  Dynamic programming strikes back , 2008, SIGMOD Conference.

[12]  Nicolas Bruno,et al.  Polynomial heuristics for query optimization , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[13]  Sumit Ganguly,et al.  Design and Analysis of Parametric Query Optimization Algorithms , 1998, VLDB.

[14]  Ion Stoica,et al.  BlinkDB: queries with bounded errors and bounded response times on very large data , 2012, EuroSys '13.

[15]  Michael C. Ferris,et al.  A Genetic Algorithm for Database Query Optimization , 1991, ICGA.

[16]  Christopher Ré,et al.  General Database Statistics Using Entropy Maximization , 2009, DBPL.

[17]  Robert E. Bixby,et al.  A Brief History of Linear and Mixed-Integer Programming Computation , 2012 .

[18]  Michael Stonebraker,et al.  Predicate migration: optimizing queries with expensive predicates , 1992, SIGMOD Conference.

[19]  Guido Moerkotte,et al.  Heuristic and randomized optimization for the join ordering problem , 1997, The VLDB Journal.

[20]  Dan Suciu,et al.  Skew in parallel query processing , 2014, PODS.

[21]  Arun N. Swami,et al.  Optimization of large join queries , 1988, SIGMOD '88.

[22]  Guido Moerkotte,et al.  On the Complexity of Generating Optimal Left-Deep Processing Trees with Cross Products , 1995, ICDT.

[23]  M. Muralikrishna,et al.  Improved Unnesting Algorithms for Join Aggregate SQL Queries , 1992, VLDB.

[24]  Yannis E. Ioannidis,et al.  Randomized algorithms for optimizing large join queries , 1990, SIGMOD '90.

[25]  Anastasia Ailamaki,et al.  An Integer Linear Programming Approach to Database Design , 2007, 2007 IEEE 23rd International Conference on Data Engineering Workshop.

[26]  Christoph Koch,et al.  Multi-objective parametric query optimization , 2014, SGMD.

[27]  Joseph M. Hellerstein,et al.  Parallelizing extensible query optimizers , 2009, SIGMOD Conference.

[28]  Jinsoo Lee,et al.  Dependency-aware reordering for parallelizing query optimization in multi-core CPUs , 2009, SIGMOD Conference.

[29]  Sumit Ganguly,et al.  On the complexity of approximate query optimization , 2002, PODS '02.

[30]  Guido Moerkotte,et al.  Analysis of two existing and one new dynamic programming algorithm for the generation of optimal bushy join trees without cross products , 2006, VLDB.

[31]  John A Lawrence,et al.  Applied Management Science , 2005 .

[32]  S. Sudarshan,et al.  AniPQO: Almost Non-intrusive Parametric Query Optimization for Nonlinear Cost Functions , 2003, VLDB.

[33]  Guido Moerkotte,et al.  Optimizing disjunctive queries with expensive predicates , 1994, SIGMOD '94.

[34]  S. Sudarshan,et al.  Parametric Query Optimization for Linear and Piecewise Linear Cost Functions , 2002, VLDB.

[35]  Christoph Koch,et al.  Parallelizing Query Optimization on Shared-Nothing Architectures , 2016, Proc. VLDB Endow..

[36]  Jian Yang,et al.  Algorithms for Materialized View Design in Data Warehousing Environment , 1997, VLDB.