SlimShot: In-Database Probabilistic Inference for Knowledge Bases

Increasingly large Knowledge Bases are being created, by crawling the Web or other corpora of documents, and by extracting facts and relations using machine learning techniques. To manage the uncertainty in the data, these KBs rely on probabilistic engines based on Markov Logic Networks (MLN), for which probabilistic inference remains a major challenge. Today's state of the art systems use variants of MCMC, which have no theoretical error guarantees, and, as we show, suffer from poor performance in practice. In this paper we describe SlimShot (Scalable Lifted Inference and Monte Carlo Sampling Hybrid Optimization Technique), a probabilistic inference engine for knowledge bases. SlimShot converts the MLN to a tuple-independent probabilistic database, then uses a simple Monte Carlo-based inference, with three key enhancements: (1) it combines sampling with safe query evaluation, (2) it estimates a conditional probability by jointly computing the numerator and denominator, and (3) it adjusts the proposal distribution based on the sample cardinality. In combination, these three techniques allow us to give formal error guarantees, and we demonstrate empirically that SlimShot outperforms to-day's state of the art probabilistic inference engines used in knowledge bases.

[1]  Russell Impagliazzo,et al.  Constructive Proofs of Concentration Bounds , 2010, APPROX-RANDOM.

[2]  Dan Suciu,et al.  Approximate Lifted Inference with Probabilistic Databases , 2014, Proc. VLDB Endow..

[3]  Dan Suciu,et al.  Oblivious bounds on the probability of boolean functions , 2014, ACM Trans. Database Syst..

[4]  Gerhard Weikum,et al.  YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia: Extended Abstract , 2013, IJCAI.

[5]  Bart Selman,et al.  Taming the Curse of Dimensionality: Discrete Integration by Hashing and Optimization , 2013, ICML.

[6]  Christopher Ré,et al.  Tuffy: Scaling up Statistical Inference in Markov Logic Networks using an RDBMS , 2011, Proc. VLDB Endow..

[7]  Dan Roth,et al.  Lifted First-Order Probabilistic Inference , 2005, IJCAI.

[8]  Pedro M. Domingos,et al.  Sound and Efficient Inference with Probabilistic and Deterministic Dependencies , 2006, AAAI.

[9]  Guy Van den Broeck,et al.  Skolemization for Weighted First-Order Model Counting , 2013, KR.

[10]  David Poole,et al.  First-order probabilistic inference , 2003, IJCAI.

[11]  Richard M. Karp,et al.  Monte-Carlo algorithms for enumeration and reliability problems , 1983, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).

[12]  Sanjit A. Seshia,et al.  Distribution-Aware Sampling and Weighted Model Counting for SAT , 2014, AAAI.

[13]  Daisy Zhe Wang,et al.  Knowledge expansion over probabilistic knowledge bases , 2014, SIGMOD Conference.

[14]  BenjellounOmar,et al.  The Active XML project , 2008, VLDB 2008.

[15]  Christopher Ré,et al.  GeoDeepDive: statistical inference using familiar data-processing languages , 2013, SIGMOD '13.

[16]  Prasoon Goyal,et al.  Probabilistic Databases , 2009, Encyclopedia of Database Systems.

[17]  Christopher Ré,et al.  MYSTIQ: a system for finding more answers by using probabilities , 2005, SIGMOD '05.

[18]  Dan Olteanu,et al.  SPROUT: Lazy vs. Eager Query Plans for Tuple-Independent Probabilistic Databases , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[19]  Bart Selman,et al.  Towards Efficient Sampling: Exploiting Random Walk Strategies , 2004, AAAI.

[20]  Christopher Ré,et al.  Towards high-throughput gibbs sampling at scale: a study across storage managers , 2013, SIGMOD '13.

[21]  Adnan Darwiche,et al.  Modeling and Reasoning with Bayesian Networks , 2009 .

[22]  Guy Van den Broeck,et al.  Understanding the Complexity of Lifted Inference and Asymmetric Weighted Model Counting , 2014, StarAI@AAAI.

[23]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[24]  Kristian Kersting,et al.  Lifted Probabilistic Inference , 2012, ECAI.

[25]  Luc De Raedt,et al.  Lifted Probabilistic Inference by First-Order Knowledge Compilation , 2011, IJCAI.

[26]  Mihalis Yannakakis,et al.  Equivalences Among Relational Expressions with the Union and Difference Operators , 1980, J. ACM.

[27]  Dan Roth,et al.  On the Hardness of Approximate Reasoning , 1993, IJCAI.

[28]  Frederick Reiss,et al.  SystemT: A Declarative Information Extraction System , 2011, ACL.

[29]  Pedro M. Domingos,et al.  Lifted First-Order Belief Propagation , 2008, AAAI.

[30]  Jeffrey F. Naughton,et al.  Declarative Information Extraction Using Datalog with Embedded Extraction Predicates , 2007, VLDB.

[31]  Dan Suciu,et al.  Probabilistic Databases with MarkoViews , 2012, Proc. VLDB Endow..

[32]  Oren Etzioni,et al.  Identifying Relations for Open Information Extraction , 2011, EMNLP.

[33]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[34]  Leslie G. Valiant,et al.  Random Generation of Combinatorial Structures from a Uniform Distribution , 1986, Theor. Comput. Sci..

[35]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[36]  Christopher De Sa,et al.  Incremental Knowledge Base Construction Using DeepDive , 2015, The VLDB Journal.

[37]  M. Luby,et al.  An Optimal Algorithm for Monte Carlo Estimation (Extended Abstract). , 1995, FOCS 1995.

[38]  Haixun Wang,et al.  Probase: a probabilistic taxonomy for text understanding , 2012, SIGMOD Conference.

[39]  Adnan Darwiche,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence SDD: A New Canonical Representation of Propositional Knowledge Bases , 2022 .

[40]  Pedro M. Domingos,et al.  Markov Logic: An Interface Layer for Artificial Intelligence , 2009, Markov Logic: An Interface Layer for Artificial Intelligence.