Static optimization of conjunctive queries with sliding windows over infinite streams

We define a framework for static optimization of sliding window conjunctive queries over infinite streams. When computational resources are sufficient, we propose that the goal of optimization should be to find an execution plan that minimizes resource usage within the available resource constraints. When resources are insufficient, on the other hand, we propose that the goal should be to find an execution plan that sheds some of the input load (by randomly dropping tuples) to keep resource usage within bounds while maximizing the output rate. An intuitive approach to load shedding suggests starting with the plan that would be optimal if resources were sufficient and adding "drop boxes" to this plan. We find this to be often times suboptimal - in many instances the optimal partial answer plan results from adding drop boxes to plans that are not optimal in the unlimited resource case. In view of this, we use our framework to investigate an approach to optimization that unifies the placement of drop boxes and the choice of the query plan from which to drop tuples. The effectiveness of our optimizer is experimentally validated and the results show the promise of this approach.

[1]  Michael Stonebraker,et al.  Load Shedding in a Data Stream Manager , 2003, VLDB.

[2]  Joseph M. Hellerstein,et al.  Optimization techniques for queries with expensive methods , 1998, TODS.

[3]  David J. DeWitt,et al.  NiagaraCQ: a scalable continuous query system for Internet databases , 2000, SIGMOD '00.

[4]  David J. DeWitt,et al.  Tuple Routing Strategies for Distributed Eddies , 2003, VLDB.

[5]  Samuel Madden,et al.  Continuously adaptive continuous queries over streams , 2002, SIGMOD '02.

[6]  Edward D. Lazowska,et al.  Quantitative system performance - computer system analysis using queueing network models , 1983, Int. CMG Conference.

[7]  Yannis E. Ioannidis,et al.  Randomized algorithms for optimizing large join queries , 1990, SIGMOD '90.

[8]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[9]  Frederick Reiss,et al.  TelegraphCQ: Continuous Dataflow Processing for an Uncertain World , 2003, CIDR.

[10]  Johannes Gehrke,et al.  Query Processing in Sensor Networks , 2003, CIDR.

[11]  Jennifer Widom,et al.  Characterizing memory requirements for queries over continuous data streams , 2002, PODS '02.

[12]  Alon Y. Halevy,et al.  An adaptive query execution system for data integration , 1999, SIGMOD '99.

[13]  Peter J. Haas,et al.  Ripple joins for online aggregation , 1999, SIGMOD '99.

[14]  Jeffrey F. Naughton,et al.  Maximizing the Output Rate of Multi-Way Join Queries over Streaming Information Sources , 2003, VLDB.

[15]  S WeldDaniel,et al.  An adaptive query execution system for data integration , 1999 .

[16]  A. N. Wilschut,et al.  Dataflow query execution in a parallel main-memory environment , 1991, [1991] Proceedings of the First International Conference on Parallel and Distributed Information Systems.

[17]  Jeffrey F. Naughton,et al.  Evaluating window joins over unbounded streams , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[18]  Laurent Amsaleg,et al.  Cost-based query scrambling for initial delays , 1998, SIGMOD '98.

[19]  Patricia G. Selinger,et al.  Access path selection in a relational database management system , 1979, SIGMOD '79.

[20]  Wei Hong,et al.  The design of an acquisitional query processor for sensor networks , 2003, SIGMOD '03.

[21]  Jennifer Widom,et al.  The CQL continuous query language: semantic foundations and query execution , 2006, The VLDB Journal.

[22]  Rajeev Motwani,et al.  Chain: operator scheduling for memory minimization in data stream systems , 2003, SIGMOD '03.

[23]  Walid G. Aref,et al.  Scheduling for shared window joins over data streams , 2003, VLDB.

[24]  KabraNavin,et al.  Efficient mid-query re-optimization of sub-optimal query execution plans , 1998 .

[25]  Toshihide Ibaraki,et al.  On the optimal nesting order for computing N-relational joins , 1984, TODS.

[26]  David J. DeWitt,et al.  NiagaraCQ: a scalable continuous query system for Internet databases , 2000, SIGMOD 2000.

[27]  Michael J. Franklin,et al.  Streaming Queries over Streaming Data , 2002, VLDB.

[28]  Qiang Chen,et al.  Aurora : a new model and architecture for data stream management ) , 2006 .

[29]  Donald D. Chamberlin,et al.  Access Path Selection in a Relational Database Management System , 1989 .

[30]  Abhinandan Das,et al.  Approximate join processing over data streams , 2003, SIGMOD '03.

[31]  Rajeev Motwani,et al.  Load shedding for aggregation queries over data streams , 2004, Proceedings. 20th International Conference on Data Engineering.

[32]  Joseph M. Hellerstein,et al.  Eddies: continuously adaptive query processing , 2000, SIGMOD '00.

[33]  Jeffrey F. Naughton,et al.  Rate-based query optimization for streaming information sources , 2002, SIGMOD '02.

[34]  Edward D. Lazowska,et al.  Quantitative System Performance , 1985, Int. CMG Conference.