Approximate Lifted Inference with Probabilistic Databases

This paper proposes a new approach for approximate evaluation of #P-hard queries with probabilistic databases. In our approach, every query is evaluated entirely in the database engine by evaluating a fixed number of query plans, each providing an upper bound on the true probability, then taking their minimum. We provide an algorithm that takes into account important schema information to enumerate only the minimal necessary plans among all possible plans. Importantly, this algorithm is a strict generalization of all known results of PTIME self-join-free conjunctive queries: A query is safe if and only if our algorithm returns one single plan. We also apply three relational query optimization techniques to evaluate all minimal safe plans very fast. We give a detailed experimental evaluation of our approach and, in the process, provide a new way of thinking about the value of probabilistic methods over non-probabilistic methods for ranking query answers.

[1]  Dan Olteanu,et al.  A dichotomy for non-repeating queries with negation in probabilistic databases , 2014, PODS.

[2]  Jerry Li,et al.  Exact Model Counting of Query Expressions , 2017, ACM Trans. Database Syst..

[3]  Patricia G. Selinger,et al.  Access path selection in a relational database management system , 1979, SIGMOD '79.

[4]  Christopher Ré,et al.  Towards high-throughput gibbs sampling at scale: a study across storage managers , 2013, SIGMOD '13.

[5]  Bart Selman,et al.  Model Counting , 2021, Handbook of Satisfiability.

[6]  Dan Suciu,et al.  Oblivious bounds on the probability of boolean functions , 2014, ACM Trans. Database Syst..

[7]  Gerhard Weikum,et al.  YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia: Extended Abstract , 2013, IJCAI.

[8]  Christopher Ré,et al.  Approximate lineage for probabilistic databases , 2008, Proc. VLDB Endow..

[9]  Pedro M. Domingos,et al.  Probabilistic theorem proving , 2011, UAI.

[10]  Luc De Raedt,et al.  Lifted Probabilistic Inference by First-Order Knowledge Compilation , 2011, IJCAI.

[11]  Dan Olteanu,et al.  SPROUT: Lazy vs. Eager Query Plans for Tuple-Independent Probabilistic Databases , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[12]  Tova Milo,et al.  Uncertainty in Crowd Data Sourcing Under Structural Constraints , 2014, DASFAA Workshops.

[13]  Dan Olteanu,et al.  On the optimal approximation of queries using tractable propositional languages , 2011, ICDT '11.

[14]  Dan Olteanu,et al.  MayBMS: Managing Incomplete Information with Probabilistic World-Set Decompositions , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[15]  Marc Najork,et al.  Computing Information Retrieval Performance Measures Efficiently in the Presence of Tied Scores , 2008, ECIR.

[16]  Danai Koutra,et al.  Linearized and Single-Pass Belief Propagation , 2014, Proc. VLDB Endow..

[17]  Christoph Koch,et al.  PIP: A database system for great and small expectations , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[18]  Christopher Ré,et al.  Efficient Top-k Query Evaluation on Probabilistic Data , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[19]  Chris Jermaine,et al.  Sampling-based estimators for subset-based queries , 2008, The VLDB Journal.

[20]  Peter J. Haas,et al.  MCDB: a monte carlo approach to managing uncertain data , 2008, SIGMOD Conference.

[21]  Carlo Zaniolo,et al.  The analytical bootstrap: a new method for fast error estimation in approximate query processing , 2014, SIGMOD Conference.

[22]  Christopher Ré,et al.  Tuffy: Scaling up Statistical Inference in Markov Logic Networks using an RDBMS , 2011, Proc. VLDB Endow..

[23]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[24]  Guy Van den Broeck,et al.  Skolemization for Weighted First-Order Model Counting , 2013, KR.

[25]  David Poole,et al.  First-order probabilistic inference , 2003, IJCAI.

[26]  Guy Van den Broeck,et al.  Lifted Relax, Compensate and then Recover: From Approximate to Exact Lifted Probabilistic Inference , 2012, UAI.

[27]  Dan Suciu,et al.  Dissociation and Propagation for Efficient Query Evaluation over Probabilistic Databases , 2013, MUD.

[28]  Geoffrey J. Gordon,et al.  Relational learning via collective matrix factorization , 2008, KDD.

[29]  Wei Zhang,et al.  Knowledge vault: a web-scale approach to probabilistic knowledge fusion , 2014, KDD.

[30]  Moritz Schröder,et al.  7 Conclusions and outlook , 2015 .

[31]  Daisy Zhe Wang,et al.  Knowledge expansion over probabilistic knowledge bases , 2014, SIGMOD Conference.

[32]  Dan Suciu,et al.  The dichotomy of probabilistic inference for unions of conjunctive queries , 2012, JACM.

[33]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[34]  Norbert Fuhr,et al.  A probabilistic relational algebra for the integration of information retrieval and database systems , 1997, TOIS.

[35]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[36]  Dan Suciu,et al.  Bridging the gap between intensional and extensional query evaluation in probabilistic databases , 2010, EDBT '10.

[37]  Guy Van den Broeck,et al.  Liftability of Probabilistic Inference: Upper and Lower Bounds , 2012 .

[38]  N. J. A. Sloane,et al.  The On-Line Encyclopedia of Integer Sequences , 2003, Electron. J. Comb..

[39]  Lise Getoor,et al.  Read-once functions and query evaluation in probabilistic databases , 2010, Proc. VLDB Endow..

[40]  Val Tannen,et al.  Faster query answering in probabilistic databases using read-once functions , 2010, ICDT '11.

[41]  Prithviraj Sen,et al.  Representing and Querying Correlated Tuples in Probabilistic Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[42]  Tova Milo,et al.  Deriving probabilistic databases with inference ensembles , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[43]  Pedro M. Domingos,et al.  Markov Logic: An Interface Layer for Artificial Intelligence , 2009, Markov Logic: An Interface Layer for Artificial Intelligence.

[44]  Dan Suciu,et al.  Probabilistic Databases with MarkoViews , 2012, Proc. VLDB Endow..

[45]  Pedro M. Domingos,et al.  Formula-Based Probabilistic Inference , 2010, UAI.

[46]  Prasoon Goyal,et al.  Probabilistic Databases , 2009, Encyclopedia of Database Systems.

[47]  Dan Olteanu,et al.  Fast and Simple Relational Processing of Uncertain Data , 2007, 2008 IEEE 24th International Conference on Data Engineering.

[48]  Guy Van den Broeck,et al.  Lifted probabilistic inference in relational models (UAI tutorial) , 2014 .

[49]  Dan Olteanu,et al.  Approximate confidence computation in probabilistic databases , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[50]  Dan Olteanu,et al.  Using OBDDs for Efficient Query Evaluation on Probabilistic Databases , 2008, SUM.

[51]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.