Bridging the gap between intensional and extensional query evaluation in probabilistic databases

There are two broad approaches to query evaluation over probabilistic databases: (1) Intensional Methods proceed by manipulating expressions over symbolic events associated with uncertain tuples. This approach is very general and can be applied to any query, but requires an expensive postprocessing phase, which involves some general-purpose probabilistic inference. (2) Extensional Methods, on the other hand, evaluate the query by translating operations over symbolic events to a query plan; extensional methods scale well, but they are restricted to safe queries. In this paper, we bridge this gap by proposing an approach that can translate the evaluation of any query into extensional operators, followed by some post-processing that requires probabilistic inference. Our approach uses characteristics of the data to adapt smoothly between the two evaluation strategies. If the query is safe or becomes safe because of the data instance, then the evaluation is completely extensional and inside the database. If the query/data combination departs from the ideal setting of a safe query, then some intensional processing is performed, whose complexity depends only on the distance from the ideal setting.

[1]  Dan Suciu,et al.  The dichotomy of conjunctive queries on probabilistic structures , 2006, PODS.

[2]  Michael Luby,et al.  An Optimal Approximation Algorithm for Bayesian Inference , 1997, Artif. Intell..

[3]  Hans L. Bodlaender,et al.  A linear time algorithm for finding tree-decompositions of small treewidth , 1993, STOC.

[4]  Dan Olteanu,et al.  MayBMS: a probabilistic database management system , 2009, SIGMOD Conference.

[5]  Christopher Ré,et al.  MYSTIQ: a system for finding more answers by using probabilities , 2005, SIGMOD '05.

[6]  Adnan Darwiche,et al.  Using DPLL for Efficient OBDD Construction , 2004, SAT.

[7]  Dan Olteanu,et al.  SPROUT: Lazy vs. Eager Query Plans for Tuple-Independent Probabilistic Databases , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[8]  Michael I. Jordan,et al.  Probabilistic Networks and Expert Systems , 1999 .

[9]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[10]  R. Dechter,et al.  Efficient reasoning in graphical models , 1999 .

[11]  Dan Olteanu,et al.  Conditioning probabilistic databases , 2008, Proc. VLDB Endow..

[12]  W. Freeman,et al.  Generalized Belief Propagation , 2000, NIPS.

[13]  Dan Olteanu,et al.  Fast and Simple Relational Processing of Uncertain Data , 2007, 2008 IEEE 24th International Conference on Data Engineering.

[14]  Dan Suciu,et al.  Management of probabilistic data: foundations and challenges , 2007, PODS '07.

[15]  Jennifer Widom,et al.  Exploiting Lineage for Confidence Computation in Uncertain and Probabilistic Databases , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[16]  Prithviraj Sen,et al.  Representing and Querying Correlated Tuples in Probabilistic Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[17]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine-mediated learning.

[18]  Christopher Ré,et al.  Query Evaluation on Probabilistic Databases , 2006, IEEE Data Eng. Bull..

[19]  Dan Olteanu,et al.  Approximate confidence computation in probabilistic databases , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[20]  Peter J. Haas,et al.  MCDB: a monte carlo approach to managing uncertain data , 2008, SIGMOD Conference.

[21]  Dan Olteanu,et al.  Using OBDDs for Efficient Query Evaluation on Probabilistic Databases , 2008, SUM.

[22]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[23]  Christopher Ré,et al.  Efficient Top-k Query Evaluation on Probabilistic Data , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[24]  Johann A. Makowsky,et al.  Counting truth assignments of formulas of bounded tree-width or clique-width , 2008, Discret. Appl. Math..

[25]  Ingo Wegener,et al.  Optimal ordered binary decision diagrams for read-once formulas , 2000, Discret. Appl. Math..

[26]  Christoph E. Koch MayBMS: A System for Managing Large Uncertain and Probabilistic Databases , 2009 .

[27]  Jennifer Widom,et al.  ULDBs: databases with uncertainty and lineage , 2006, VLDB.