Exploiting correlated attributes in acquisitional query processing

Sensor networks and other distributed information systems (such as the Web) must frequently access data that has a high per-attribute acquisition cost, in terms of energy, latency, or computational resources. When executing queries that contain several predicates over such expensive attributes, we observe that it can be beneficial to use correlations to automatically introduce low-cost attributes whose observation will allow the query processor to better estimate die selectivity of these expensive predicates. In particular, we show how to build conditional plans that branch into one or more sub-plans, each with a different ordering for the expensive query predicates, based on the runtime observation of low-cost attributes. We frame the problem of constructing the optimal conditional plan for a given user query and set of candidate low-cost attributes as an optimization problem. We describe an exponential time algorithm for finding such optimal plans, and describe a polynomial-time heuristic for identifying conditional plans that perform well in practice. We also show how to compactly model conditional probability distributions needed to identify correlations and build these plans. We evaluate our algorithms against several real-world sensor-network data sets, showing several-times performance increases for a variety of queries versus traditional optimization techniques.

[1]  Jennifer Widom,et al.  The Pipelined Set Cover Problem , 2005, ICDT.

[2]  Joseph Polastre,et al.  Design and implementation ofwireless sensor networks for habitat monitoring , 2003 .

[3]  Joseph M. Hellerstein,et al.  Optimization techniques for queries with expensive methods , 1998, TODS.

[4]  Wei Hong,et al.  The design of an acquisitional query processor for sensor networks , 2003, SIGMOD '03.

[5]  Goetz Graefe,et al.  Optimization of dynamic query evaluation plans , 1994, SIGMOD '94.

[6]  Carlo Zaniolo,et al.  Optimization of Nonrecursive Queries , 1986, VLDB.

[7]  Johannes Gehrke,et al.  Query optimization in compressed database systems , 2001, SIGMOD '01.

[8]  Wei Hong,et al.  Model-Driven Data Acquisition in Sensor Networks , 2004, VLDB.

[9]  Karen Ward,et al.  Dynamic query evaluation plans , 1989, SIGMOD '89.

[10]  Gregory J. Pottie,et al.  Wireless integrated network sensors , 2000, Commun. ACM.

[11]  Ronald L. Rivest,et al.  Constructing Optimal Binary Decision Trees is NP-Complete , 1976, Inf. Process. Lett..

[12]  David J. Spiegelhalter,et al.  Probabilistic Networks and Expert Systems , 1999, Information Science and Statistics.

[13]  Jennifer Widom,et al.  Adaptive ordering of pipelined stream filters , 2004, SIGMOD '04.

[14]  Surajit Chaudhuri,et al.  Optimization of queries with user-defined predicates , 1996, TODS.

[15]  Russell Greiner,et al.  Optimal depth-first strategies for and-or trees , 2002, AAAI/IAAI.

[16]  Forouzan Golshani,et al.  Proceedings of the Eighth International Conference on Data Engineering , 1992 .

[17]  Timos K. Sellis,et al.  Parametric query optimization , 1992, The VLDB Journal.

[18]  Hector Garcia-Molina,et al.  Filtering with Approximate Predicates , 1998, VLDB.

[19]  Sumit Ganguly,et al.  Design and Analysis of Parametric Query Optimization Algorithms , 1998, VLDB.

[20]  Robert Szewczyk,et al.  System architecture directions for networked sensors , 2000, ASPLOS IX.