SlimShot: Probabilistic Inference for Web-Scale Knowledge Bases

Increasingly large Knowledge Bases are being created, by crawling the Web or other corpora of documents, and by extracting facts and relations using machine learning techniques. To manage the uncertainty in the data, these KBs rely on probabilistic engines based on Markov Logic Networks (MLN), for which probabilistic inference remains a major challenge. Today’s state of the art systems use variants of MCMC, which have no theoretical error guarantees, and, as we show, suffer from poor performance in practice. In this paper we describe SlimShot (Scalable Lifted Inference and Monte Carlo Sampling Hybrid Optimization Technique), a probabilistic inference engine for Web-Scale knowledge bases. SlimShot converts the MLN to a tupleindependent probabilistic database, then uses a simple Monte Carlo-based inference, with three key enhancements: (1) it combines sampling with safe query evaluation, (2) it estimates a conditional probability by jointly computing the numerator and denominator, and (3) it adjusts the proposal distribution based on the sample cardinality. In combination, these three techniques allow us to give formal error guarantees, and we demonstrate empirically that SlimShot outperforms today’s state of the art probabilistic inference engines used in knowledge bases.

[1]  Russell Impagliazzo,et al.  Constructive Proofs of Concentration Bounds , 2010, APPROX-RANDOM.

[2]  Adnan Darwiche,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence SDD: A New Canonical Representation of Propositional Knowledge Bases , 2022 .

[3]  Gerhard Weikum,et al.  YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia: Extended Abstract , 2013, IJCAI.

[4]  Pedro M. Domingos,et al.  Markov Logic: An Interface Layer for Artificial Intelligence , 2009, Markov Logic: An Interface Layer for Artificial Intelligence.

[5]  Dan Suciu,et al.  Probabilistic Databases with MarkoViews , 2012, Proc. VLDB Endow..

[6]  Leslie G. Valiant,et al.  Random Generation of Combinatorial Structures from a Uniform Distribution , 1986, Theor. Comput. Sci..

[7]  Serge Abiteboul,et al.  The Active XML project: an overview , 2008, The VLDB Journal.

[8]  Oren Etzioni,et al.  Open Information Extraction: The Second Generation , 2011, IJCAI.

[9]  Pedro M. Domingos,et al.  Lifted First-Order Belief Propagation , 2008, AAAI.

[10]  Volker Tresp,et al.  Querying Factorized Probabilistic Triple Databases , 2014, SEMWEB.

[11]  Christopher Ré,et al.  GeoDeepDive: statistical inference using familiar data-processing languages , 2013, SIGMOD '13.

[12]  Christopher Ré,et al.  Tuffy: Scaling up Statistical Inference in Markov Logic Networks using an RDBMS , 2011, Proc. VLDB Endow..

[13]  Christopher Ré,et al.  DeepDive: Web-scale Knowledge-base Construction using Statistical Learning and Inference , 2012, VLDS.

[14]  Bart Selman,et al.  Taming the Curse of Dimensionality: Discrete Integration by Hashing and Optimization , 2013, ICML.

[15]  Guy Van den Broeck,et al.  Understanding the Complexity of Lifted Inference and Asymmetric Weighted Model Counting , 2014, StarAI@AAAI.

[16]  Bart Selman,et al.  Towards Efficient Sampling: Exploiting Random Walk Strategies , 2004, AAAI.

[17]  Kristian Kersting,et al.  Lifted Probabilistic Inference , 2012, ECAI.

[18]  Luc De Raedt,et al.  Lifted Probabilistic Inference by First-Order Knowledge Compilation , 2011, IJCAI.

[19]  Kai-Wei Chang,et al.  Typed Tensor Decomposition of Knowledge Bases for Relation Extraction , 2014, EMNLP.

[20]  Mihalis Yannakakis,et al.  Equivalences Among Relational Expressions with the Union and Difference Operators , 1980, J. ACM.

[21]  Wei Zhang,et al.  Knowledge vault: a web-scale approach to probabilistic knowledge fusion , 2014, KDD.

[22]  Dan Roth,et al.  Lifted First-Order Probabilistic Inference , 2005, IJCAI.

[23]  Pedro M. Domingos,et al.  Sound and Efficient Inference with Probabilistic and Deterministic Dependencies , 2006, AAAI.

[24]  Prasoon Goyal,et al.  Probabilistic Databases , 2009, Encyclopedia of Database Systems.

[25]  Christopher Ré,et al.  MYSTIQ: a system for finding more answers by using probabilities , 2005, SIGMOD '05.

[26]  Dan Olteanu,et al.  SPROUT: Lazy vs. Eager Query Plans for Tuple-Independent Probabilistic Databases , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[27]  Sanjit A. Seshia,et al.  Distribution-Aware Sampling and Weighted Model Counting for SAT , 2014, AAAI.

[28]  Dan Roth,et al.  On the Hardness of Approximate Reasoning , 1993, IJCAI.

[29]  Guy Van den Broeck,et al.  Skolemization for Weighted First-Order Model Counting , 2013, KR.

[30]  David Poole,et al.  First-order probabilistic inference , 2003, IJCAI.

[31]  Richard M. Karp,et al.  Monte-Carlo algorithms for enumeration and reliability problems , 1983, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).

[32]  M. Luby,et al.  An Optimal Algorithm for Monte Carlo Estimation (Extended Abstract). , 1995, FOCS 1995.

[33]  Adnan Darwiche,et al.  Modeling and Reasoning with Bayesian Networks , 2009 .