Approximating predicates and expressive queries on probabilistic databases

We study complexity and approximation of queries in an expressive query language for probabilistic databases. The language studied supports the compositional use of confidence computation. It allows for a wide range of new use cases, such as the computation of conditional probabilities and of selections based on predicates that involve marginal and conditional probabilities. These features have important applications in areas such as data cleaning and the processing of sensor data. We establish techniques for efficiently computing approximate query results and for estimating the error incurred by queries. The central difficulty is due to selection predicates based on approximated values, which may lead to the unreliable selection of tuples. A database may contain certain singularities at which approximation of predicates cannot be achieved; however, the paper presents an algorithm that provides efficient approximation otherwise.

[1]  Christopher Ré,et al.  MYSTIQ: a system for finding more answers by using probabilities , 2005, SIGMOD '05.

[2]  Moshe Y. Vardi The complexity of relational query languages (Extended Abstract) , 1982, STOC '82.

[3]  Dan Olteanu,et al.  From complete to incomplete information and back , 2007, SIGMOD '07.

[4]  Christoph Koch,et al.  World-set decompositions: Expressiveness and efficient algorithms , 2007, Theor. Comput. Sci..

[5]  Dan Olteanu,et al.  Query language support for incomplete information in the MayBMS system , 2007, VLDB.

[6]  Jennifer Widom,et al.  ULDBs: databases with uncertainty and lineage , 2006, VLDB.

[7]  Yuri Gurevich,et al.  The complexity of query reliability , 1998, PODS.

[8]  Richard M. Karp,et al.  Monte-Carlo algorithms for enumeration and reliability problems , 1983, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).

[9]  Helen J. Wang,et al.  Online aggregation , 1997, SIGMOD '97.

[10]  Joseph Y. Halpern Reasoning about uncertainty , 2003 .

[11]  Dan Olteanu,et al.  Fast and Simple Relational Processing of Uncertain Data , 2007, 2008 IEEE 24th International Conference on Data Engineering.

[12]  Christopher Ré,et al.  Efficient Top-k Query Evaluation on Probabilistic Data , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[13]  Dan Suciu,et al.  The dichotomy of conjunctive queries on probabilistic structures , 2006, PODS.

[14]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[15]  Michel de Rougemont,et al.  The Reliability of Queries. , 1995, ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems.

[16]  Chris Jermaine,et al.  Scalable approximate query processing with the DBO engine , 2008, TODS.

[17]  Prithviraj Sen,et al.  Representing and Querying Correlated Tuples in Probabilistic Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[18]  Michel de Rougemont The reliability of queries (extended abstract) , 1995, PODS '95.