Anytime Approximation in Probabilistic Databases via Scaled Dissociations

Speeding up probabilistic inference remains a key challenge in probabilistic databases (PDBs) and the related area of statistical relational learning (SRL). Since computing probabilities for query answers is #P-hard, even for fairly simple conjunctive queries, both the PDB and SRL communities have proposed a number of approximation techniques over the years. The two prevalent techniques are either (i) MCMC-style sampling or (ii) branch-and-bound (B&B) algorithms that iteratively improve model-based bounds using a combination of variable substitution and elimination. We propose a new anytime B&B approximation scheme that encompasses all prior model-based approximation schemes proposed in the PDB and SRL literature. Our approach relies on the novel idea of "scaled dissociation" which can improve both the upper and lower bounds of existing modelbased algorithms. We apply our approach to the well-studied problem of evaluating self-join-free conjunctive queries over tuple-independent PDBs, and show a consistent reduction in approximation error in our experiments on TPC-H, Yago3, and a synthetic benchmark setting.

[1]  Dan Suciu,et al.  Probabilistic Databases with MarkoViews , 2012, Proc. VLDB Endow..

[2]  Guy Van den Broeck,et al.  Query Processing on Probabilistic Data: A Survey , 2017, Found. Trends Databases.

[3]  Bart Selman,et al.  Model Counting , 2021, Handbook of Satisfiability.

[4]  Lise Getoor,et al.  Exploiting shared correlations in probabilistic databases , 2008, Proc. VLDB Endow..

[5]  Dan Olteanu,et al.  Aggregation in Probabilistic Databases via Knowledge Compilation , 2012, Proc. VLDB Endow..

[6]  Dan Suciu,et al.  Probabilistic Event Extraction from RFID Data , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[7]  Prasoon Goyal,et al.  Probabilistic Databases , 2009, Encyclopedia of Database Systems.

[8]  Martin Theobald,et al.  Top-k query processing in probabilistic databases with non-materialized views , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[9]  Dan Suciu,et al.  Approximate Lifted Inference with Probabilistic Databases , 2014, Proc. VLDB Endow..

[10]  Dan Suciu,et al.  Oblivious bounds on the probability of boolean functions , 2014, ACM Trans. Database Syst..

[11]  Christopher Ré,et al.  Efficient Top-k Query Evaluation on Probabilistic Data , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[12]  Christopher Ré,et al.  Approximate lineage for probabilistic databases , 2008, Proc. VLDB Endow..

[13]  Richard M. Karp,et al.  Monte-Carlo Approximation Algorithms for Enumeration Problems , 1989, J. Algorithms.

[14]  Christopher Ré,et al.  Query Evaluation on Probabilistic Databases , 2006, IEEE Data Eng. Bull..

[15]  Lise Getoor,et al.  Read-once functions and query evaluation in probabilistic databases , 2010, Proc. VLDB Endow..

[16]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[17]  Pierre Marquis,et al.  A Knowledge Compilation Map , 2002, J. Artif. Intell. Res..

[18]  Christopher Ré,et al.  Managing Probabilistic Data with MystiQ : The Can-Do , the Could-Do , and the Can ’ t-Do ? , 2008 .

[19]  Mark S. Boddy,et al.  Anytime Problem Solving Using Dynamic Programming , 1991, AAAI.

[20]  Adnan Darwiche Relax, Compensate and Then Recover: A Theory of Anytime, Approximate Inference , 2010, JELIA.

[21]  Christopher De Sa,et al.  Incremental Knowledge Base Construction Using DeepDive , 2015, The VLDB Journal.

[22]  Jian Li,et al.  Sensitivity analysis and explanations for robust query evaluation in probabilistic databases , 2011, SIGMOD '11.

[23]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .

[24]  Wei Zhang,et al.  Knowledge vault: a web-scale approach to probabilistic knowledge fusion , 2014, KDD.

[25]  Fabian M. Suchanek,et al.  YAGO3: A Knowledge Base from Multilingual Wikipedias , 2015, CIDR.

[26]  Michael Rice,et al.  A Survey of Static Variable Ordering Heuristics for Efficient BDD / MDD Construction , 2008 .

[27]  Jennifer Widom,et al.  Databases with uncertainty and lineage , 2008, The VLDB Journal.

[28]  Adnan Darwiche,et al.  On probabilistic inference by weighted model counting , 2008, Artif. Intell..

[29]  Dan Olteanu,et al.  On the optimal approximation of queries using tractable propositional languages , 2011, ICDT '11.

[30]  Dan Olteanu,et al.  MayBMS: Managing Incomplete Information with Probabilistic World-Set Decompositions , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[31]  Dan Olteanu,et al.  Secondary-storage confidence computation for conjunctive queries with inequalities , 2009, SIGMOD Conference.

[32]  Dan Suciu,et al.  The dichotomy of probabilistic inference for unions of conjunctive queries , 2012, JACM.

[33]  Christopher Ré,et al.  MYSTIQ: a system for finding more answers by using probabilities , 2005, SIGMOD '05.

[34]  Shlomo Zilberstein,et al.  Using Anytime Algorithms in Intelligent Systems , 1996, AI Mag..

[35]  Dan Olteanu,et al.  SPROUT: Lazy vs. Eager Query Plans for Tuple-Independent Probabilistic Databases , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[36]  Haixun Wang,et al.  Probase: a probabilistic taxonomy for text understanding , 2012, SIGMOD Conference.

[37]  Norbert Fuhr,et al.  A probabilistic relational algebra for the integration of information retrieval and database systems , 1997, TOIS.

[38]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[39]  Pedro M. Domingos,et al.  Probabilistic theorem proving , 2011, UAI.

[40]  Miguel Á. Carreira-Perpiñán,et al.  Projection onto the probability simplex: An efficient algorithm with a simple proof, and an application , 2013, ArXiv.

[41]  CACM Staff,et al.  What about statistical relational learning? , 2015, Commun. ACM.

[42]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[43]  Dan Olteanu,et al.  Approximate confidence computation in probabilistic databases , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[44]  Dan Olteanu,et al.  Anytime approximation in probabilistic databases , 2013, The VLDB Journal.

[45]  Dan Olteanu,et al.  Using OBDDs for Efficient Query Evaluation on Probabilistic Databases , 2008, SUM.

[46]  Dan Suciu,et al.  Dissociation and propagation for approximate lifted inference with standard relational database management systems , 2013, The VLDB Journal.

[47]  Henry A. Kautz,et al.  Performing Bayesian Inference by Weighted Model Counting , 2005, AAAI.