Canonical abstraction for outerjoin optimization

Outerjoins are an important class of joins and are widely used in various kinds of applications. It is challenging to optimize queries that contain outerjoins because outerjoins do not always commute with inner joins. Previous work has studied this problem and provided techniques that allow certain reordering of the join sequences. However, the optimization of outerjoin queries is still not as powerful as that of inner joins.An inner join query can always be canonically represented as a sequence of Cartesian products of all relations, followed by a sequence of selection operations, each applying a conjunct in the join predicates. This canonical abstraction is very powerful because it enables the optimizer to use any join sequence for plan generation. Unfortunately, such a canonical abstraction for outerjoin queries has not been developed. As a result, existing techniques always exclude certain join sequences from planning, which can lead to a severe performance penalty.Given a query consisting of a sequence of inner and outer joins, we, for the first time, present a canonical abstraction based on three operations: outer Cartesian products, nullification, and best match. Like the inner join abstraction, our outerjoin abstraction permits all join sequences, and preserves the property of both commutativity and transitivity among predicates. This allows us to generate plans that are very desirable for performance reasons but that couldn't be done before. We present an algorithm that produces such a canonical abstraction, and a method that extends an inner-join optimizer to generate plans in an expanded search space. We also describe an efficient implementation of the best match operation using the OLAP functionalities in SQL:1999. Our experimental results show that our technique can significantly improve the performance of outerjoin queries.

[1]  Arnon Rosenthal,et al.  Query graphs, implementing trees, and freely-reorderable outerjoins , 1990, SIGMOD '90.

[2]  Hamid Pirahesh,et al.  Using EELs, a practical approach to outerjoin and antijoin reordering , 2001, Proceedings 17th International Conference on Data Engineering.

[3]  Arnon Rosenthal,et al.  Outerjoin simplification and reordering for query optimization , 1997, TODS.

[4]  Laura M. Haas,et al.  Clio: a semi-automatic tool for schema mapping , 2001, SIGMOD '01.

[5]  Won Kim,et al.  On optimizing an SQL-like nested query , 1982, TODS.

[6]  Umeshwar Dayal,et al.  Processing queries with quantifiers a horticultural approach , 1983, PODS.

[7]  Jim Melton,et al.  Advanced SQL:1999: Understanding Object-Relational and Other Advanced Features , 2002 .

[8]  Goetz Graefe,et al.  The Volcano optimizer generator: extensibility and efficient search , 1993, Proceedings of IEEE 9th International Conference on Data Engineering.

[9]  D. R. Fulkerson,et al.  Flows in Networks , 1963 .

[10]  C. Galindo-Legaria Algebraic optimization of outerjoin queries , 1992 .

[11]  E. F. Codd,et al.  Extending the database relational model to capture more meaning , 1979, ACM Trans. Database Syst..

[12]  Patricia G. Selinger,et al.  Access path selection in a relational database management system , 1979, SIGMOD '79.

[13]  Arnon Rosenthal,et al.  How to extend a conventional optimizer to handle one- and two-sided outerjoin , 1992, [1992] Eighth International Conference on Data Engineering.

[14]  Jeffrey D. Ullman,et al.  Principles of Database and Knowledge-Base Systems, Volume II , 1988, Principles of computer science series.

[15]  César A. Galindo-Legaria,et al.  Outerjoins as disjunctions , 1994, SIGMOD '94.