Implementing NOT EXISTS Predicates over a Probabilistic Database

Systems for managing uncertain data need to support queries with negated subgoals, which are typically expressed in SQL through the NOT EXISTS predicate. For example, the user of an RFID tracking system may want to find all RFID tags (people or objects) that have traveled from a point A to a point C without going through a point D. Such queries are difficult to support in a probabilistic database management system, because offending tuples do not necessarily disqualify an answer, but only decrease its probability. In this paper, we present an approach for supporting queries with NOT EXISTS in a probabilistic database management system, by leveraging the existing query processing infrastructure. Our approach is to break up the query into multiple, monotone queries, which can be evaluated in the current system, then to combine their probabilities by addition and subtraction to compute that of the original query. We will also describe how this technique was integrated with MystiQ, and how we incorporated the top-k multi-simulation and safe-plans optimizations.

[1]  Alon Y. Halevy,et al.  Data integration with uncertainty , 2007, The VLDB Journal.

[2]  Magdalena Balazinska,et al.  Challenges for Pervasive RFID-Based Infrastructures , 2007, Fifth Annual IEEE International Conference on Pervasive Computing and Communications Workshops (PerComW'07).

[3]  Christopher Ré,et al.  Query Evaluation on Probabilistic Databases , 2006, IEEE Data Eng. Bull..

[4]  Samuel Madden,et al.  Using Probabilistic Models for Data Management in Acquisitional Environments , 2005, CIDR.

[5]  Rahul Gupta,et al.  Creating probabilistic databases from information extraction models , 2006, VLDB.

[6]  Minos N. Garofalakis,et al.  Adaptive cleaning for RFID data streams , 2006, VLDB.

[7]  Dan Suciu,et al.  The Boundary Between Privacy and Utility in Data Publishing , 2007, VLDB.

[8]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[9]  Christopher Ré,et al.  Efficient Top-k Query Evaluation on Probabilistic Data , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[10]  Jennifer Widom,et al.  An Introduction to ULDBs and the Trio System , 2006, IEEE Data Eng. Bull..

[11]  T. S. Jayram,et al.  Efficient aggregation algorithms for probabilistic data , 2007, SODA '07.

[12]  Raghu Ramakrishnan,et al.  Community Information Management , 2006, IEEE Data Eng. Bull..

[13]  Prithviraj Sen,et al.  Representing and Querying Correlated Tuples in Probabilistic Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.