Pipelining in multi-query optimization

Database systems frequently have to execute a set of related queries, which share several common subexpressions. Multi-query optimization exploits this, by finding evaluation plans that share common results. Current approaches to multi-query optimization assume that common subexpressions are materialized. Significant performance benefits can be had if common subexpressions are pipelined to their uses, without being materialized. However, plans with pipelining may not always be realizable with limited buffer space, as we show. We present a general model for schedules with pipelining, and present a necessary and sufficient condition for determining validity of a schedule under our model. We show that finding a valid schedule with minimum cost is NP-hard. We present a greedy heuristic for finding good schedules. Finally, we present a performance study that shows the benefit of our algorithms on batches of queries from the TPCD benchmark.

[1]  Arnon Rosenthal,et al.  Anatomy of a Mudular Multiple Query Optimizer , 1988, VLDB.

[2]  Goetz Graefe,et al.  Query evaluation techniques for large databases , 1993, CSUR.

[3]  Sheldon J. Finkelstein Common expression analysis in database applications , 1982, SIGMOD '82.

[4]  Patricia G. Selinger,et al.  Access path selection in a relational database management system , 1979, SIGMOD '79.

[5]  Amr El Abbadi,et al.  On tuning and optimization for multiple queries in databases , 2002 .

[6]  Latha S. Colby,et al.  Redbrick Vista: Aggregate Computation and Management , 1998, ICDE 1998.

[7]  Wei Hong,et al.  Exploiting inter-operation parallelism in XPRS , 1992, SIGMOD '92.

[8]  Hongjun Lu,et al.  Workload Scheduling for Multiple Query Processing , 1995, Inf. Process. Lett..

[9]  Timos K. Sellis,et al.  Improvements on a Heuristic Algorithm for Multiple-Query Optimization , 1994, Data Knowl. Eng..

[10]  Patrick A. V. Hall,et al.  Optimization of a Single Relation Expression in a Relational Data Base System , 1976, IBM J. Res. Dev..

[11]  Arie Segev,et al.  Using common subexpressions to optimize multiple queries , 1988, Proceedings. Fourth International Conference on Data Engineering.

[12]  Timos K. Sellis,et al.  On the Multiple-Query Optimization Problem , 1990, IEEE Trans. Knowl. Data Eng..

[13]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[14]  Amr El Abbadi,et al.  Multiple query optimization by cache-aware middleware using query teamwork , 2002, Proceedings 18th International Conference on Data Engineering.

[15]  Timos K. Sellis,et al.  Multiple-query optimization , 1988, TODS.

[16]  Jaideep Srivastava,et al.  Multiple query optimization with Depth-First Branch-and-Bound and dynamic query ordering , 1993, CIKM '93.

[17]  Latha S. Colby,et al.  Red Brick Vista/sup TM/: aggregate computation and management , 1998, Proceedings 14th International Conference on Data Engineering.

[18]  Jeffrey F. Naughton,et al.  Materialized View Selection for Multidimensional Datasets , 1998, VLDB.

[19]  Shivakumar Venkataraman,et al.  Cost-based optimization of decision support queries using transient-views , 1998, SIGMOD '98.

[20]  Rajeev Motwani,et al.  Scheduling problems in parallel query optimization , 1995, PODS '95.

[21]  Goetz Graefe,et al.  The Volcano optimizer generator: extensibility and efficient search , 1993, Proceedings of IEEE 9th International Conference on Data Engineering.

[22]  Jeffrey F. Naughton,et al.  Simultaneous optimization and evaluation of multiple dimensional queries , 1998, SIGMOD '98.