Evaluating probabilistic queries over imprecise data

Many applications employ sensors for monitoring entities such as temperature and wind speed. A centralized database tracks these entities to enable query processing. Due to continuous changes in these values and limited resources (e.g., network bandwidth and battery power), it is often infeasible to store the exact values at all times. A similar situation exists for moving object environments that track the constantly changing locations of objects. In this environment, it is possible for database queries to produce incorrect or invalid results based upon old data. However, if the degree of error (or uncertainty) between the actual value and the database value is controlled, one can place more confidence in the answers to queries. More generally, query answers can be augmented with probabilistic estimates of the validity of the answers. In this paper we study probabilistic query evaluation based upon uncertain data. A classification of queries is made based upon the nature of the result set. For each class, we develop algorithms for computing probabilistic answers. We address the important issue of measuring the quality of the answers to these queries, and provide algorithms for efficiently pulling data from relevant sensors or moving objects in order to improve the quality of the executing queries. Extensive experiments are performed to examine the effectiveness of several data update policies.

[1]  Jane W.-S. Liu,et al.  Producing approximate answers to set- and single-valued queries , 1994, J. Syst. Softw..

[2]  A. Prasad Sistla,et al.  Querying the Uncertain Position of Moving Objects , 1997, Temporal Databases, Dagstuhl.

[3]  Christian S. Jensen,et al.  Temporal Databases: Research and Practice , 1998, Lecture Notes in Computer Science.

[4]  Yossi Matias,et al.  New sampling-based summary statistics for improving approximate query answers , 1998, SIGMOD '98.

[5]  Sridhar Ramaswamy,et al.  Join synopses for approximate query answering , 1999, SIGMOD '99.

[6]  Viswanath Poosala,et al.  Fast approximate query answering using precomputed statistics , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[7]  Jennifer Widom,et al.  Offering a Precision-Performance Tradeoff for Aggregation Queries over Replicated Data , 2000, VLDB.

[8]  Sanjeev Khanna,et al.  On computing functions with uncertainty , 2001, PODS '01.

[9]  Dieter Pfoser,et al.  Querying the trajectories of on-line mobile objects , 2001, MobiDe '01.

[10]  Jennifer Widom,et al.  Adaptive precision setting for cached approximate values , 2001, SIGMOD '01.

[11]  Claude E. Shannon,et al.  A mathematical theory of communication , 1948, MOCO.

[12]  Walid G. Aref,et al.  Efficient Evaluation of Continuous Range Queries on Moving Objects , 2002, DEXA.

[13]  Walid G. Aref,et al.  Query Indexing and Velocity Constrained Indexing: Scalable Techniques for Continuous Queries on Moving Objects , 2002, IEEE Trans. Computers.

[14]  Jennifer Widom,et al.  Best-effort cache synchronization with source cooperation , 2002, SIGMOD '02.

[15]  A. Prasad Sistla,et al.  Updating and Querying Databases that Track Mobile Units , 1999, Distributed and Parallel Databases.

[16]  Sunil Prabhakar,et al.  Querying imprecise data in moving object environments , 2003, IEEE Transactions on Knowledge and Data Engineering.