Ordering Pipelined Query Operators with Precedence Constraints

We consider the problem of optimally arranging a collection of query operators into a pipelined execution plan in the presence of precedence constraints among the operators. The goal of our optimization is to maximize the rate at which input data items can be processed through the pipelined plan. We consider two different scenarios: one in which each operator is fixed to run on a separate machine, and the other in which all operators run on the same machine. Due to parallelism in the former scenario, the cost of a plan is given by the maximum (or {\em bottleneck}) cost incurred by any operator in the plan. In the latter scenario, the cost of a plan is given by the {\em sum} of the costs incurred by the operators in the plan. These two different cost metrics lead to fundamentally different optimization problems: Under the bottleneck cost metric, we give a general, polynomial-time greedy algorithm that always finds the optimal plan. However, under the sum cost metric, the problem is much harder: We show that it is unlikely that any polynomial-time algorithm can approximate the optimal plan to within a factor smaller than $O(n^{\theta})$, where $n$ is the number of operators, and $\theta$ is some positive constant. Finally, under the sum cost metric, for the special case when the selectivity of each operator lies in $[\epsilon,1-\epsilon]$, we give an algorithm that produces a $2$-approximation to the optimal plan but has running time exponential in $1/\epsilon$.

[1]  A. V. Karzanov,et al.  Determining the maximal flow in a network by the method of preflows , 1974 .

[2]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[3]  Patrick Valduriez,et al.  Principles of Distributed Database Systems , 1990 .

[4]  Donovan A. Schneider,et al.  The Gamma Database Machine Project , 1990, IEEE Trans. Knowl. Data Eng..

[5]  Michael Stonebraker,et al.  Optimization of parallel query execution plans in XPRS , 1991, [1991] Proceedings of the First International Conference on Parallel and Distributed Information Systems.

[6]  Guy Kortsarz,et al.  On choosing a dense subgraph , 1993, Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science.

[7]  Michael Stonebraker,et al.  Predicate migration: optimizing queries with expensive predicates , 1992, SIGMOD Conference.

[8]  Oren Etzioni,et al.  Efficient information gathering on the Internet , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[9]  Helen J. Wang,et al.  Online aggregation , 1997, SIGMOD '97.

[10]  Mary Roth,et al.  Don't Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources , 1997, VLDB.

[11]  U. Feige,et al.  On the Densest K-subgraph Problem , 1997 .

[12]  Ioana Manolescu,et al.  Query optimization in the presence of limited access patterns , 1999, SIGMOD '99.

[13]  Surajit Chaudhuri,et al.  Optimization of queries with user-defined predicates , 1996, TODS.

[14]  Joseph M. Hellerstein,et al.  Eddies: continuously adaptive query processing , 2000, SIGMOD '00.

[15]  Vladimir Zadorozhny,et al.  Efficient evaluation of queries in a mediator for WebSources , 2002, SIGMOD '02.

[16]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[17]  David J. DeWitt,et al.  Tuple Routing Strategies for Distributed Eddies , 2003, VLDB.

[18]  Rajeev Motwani,et al.  Load shedding for aggregation queries over data streams , 2004, Proceedings. 20th International Conference on Data Engineering.

[19]  Jennifer Widom,et al.  Adaptive ordering of pipelined stream filters , 2004, SIGMOD '04.

[20]  Athman Bouguettaya,et al.  Query Processing and Optimization on the Web , 2004, Distributed and Parallel Databases.

[21]  Jennifer Widom,et al.  The TSIMMIS Approach to Mediation: Data Models and Languages , 1997, Journal of Intelligent Information Systems.

[22]  Jennifer Widom,et al.  Query optimization over web services , 2006, VLDB.