Secondary-storage confidence computation for conjunctive queries with inequalities

This paper investigates the problem of efficiently computing the confidences of distinct tuples in the answers to conjunctive queries with inequalities (<) on tuple-independent probabilistic databases. This problem is fundamental to probabilistic databases and was recently stated open. Our contributions are of both theoretical and practical importance. We define a class of tractable queries with inequalities, and generalize existing results on #P-hardness of query evaluation, now in the presence of inequalities. For the tractable queries, we introduce a confidence computation technique based on efficient compilation of the lineage of the query answer into Ordered Binary Decision Diagrams (OBDDs), whose sizes are linear in the number of variables of the lineage. We implemented a secondary-storage variant of our technique in PostgreSQL. This variant does not need to materialize the OBDD, but computes, in one scan over the lineage, the probabilities of OBDD fragments and combines them on the fly. Experiments with probabilistic TPC-H data show up to two orders of magnitude improvements when compared with state-of-the-art approaches.

[1]  Richard M. Karp,et al.  An optimal algorithm for Monte Carlo estimation , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[2]  Christopher Ré,et al.  Efficient Top-k Query Evaluation on Probabilistic Data , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[3]  Dan Olteanu,et al.  10106 Worlds and Beyond: Efficient Representation and Processing of Incomplete Information , 2007, ICDE.

[4]  Dan Suciu,et al.  The dichotomy of conjunctive queries on probabilistic structures , 2006, PODS.

[5]  Dan Olteanu,et al.  $${10^{(10^{6})}}$$ worlds and beyond: efficient representation and processing of incomplete information , 2006, 2007 IEEE 23rd International Conference on Data Engineering.

[6]  Prof. Dr. Christoph Meinel,et al.  Algorithms and Data Structures in VLSI Design , 1998, Springer Berlin Heidelberg.

[7]  Christopher Ré,et al.  Managing Uncertainty in Social Networks , 2007, IEEE Data Eng. Bull..

[8]  Pierre Marquis,et al.  A Knowledge Compilation Map , 2002, J. Artif. Intell. Res..

[9]  Peter J. Haas,et al.  MCDB: a monte carlo approach to managing uncertain data , 2008, SIGMOD Conference.

[10]  Richard M. Karp,et al.  Monte-Carlo algorithms for enumeration and reliability problems , 1983, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).

[11]  Dan Olteanu,et al.  Using OBDDs for Efficient Query Evaluation on Probabilistic Databases , 2008, SUM.

[12]  Jennifer Widom,et al.  ULDBs: databases with uncertainty and lineage , 2006, VLDB.

[13]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[14]  Dan Olteanu,et al.  Conditioning probabilistic databases , 2008, Proc. VLDB Endow..

[15]  Dan Olteanu,et al.  SPROUT: Lazy vs. Eager Query Plans for Tuple-Independent Probabilistic Databases , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[16]  Dan Olteanu,et al.  MayBMS: a probabilistic database management system , 2009, SIGMOD Conference.

[17]  Dan Suciu,et al.  Management of probabilistic data: foundations and challenges , 2007, PODS '07.