Open-World Probabilistic Databases: An Abridged Report

Large-scale probabilistic knowledge bases are becoming increasingly important in academia and industry alike. They are constantly extended with new data, powered by modern information extraction tools that associate probabilities with database tuples. In this paper, we revisit the semantics underlying such systems. In particular, the closed-world assumption of probabilistic databases, that facts not in the database have probability zero, clearly conflicts with their everyday use. To address this discrepancy, we propose an open-world probabilistic database semantics, which relaxes the probabilities of open facts to default intervals. For this openworld setting, we lift the existing data complexity dichotomy of probabilistic databases, and propose an efficient evaluation algorithm for unions of conjunctive queries. We also show that query evaluation can become harder for non-monotone queries.

[1]  Guy Van den Broeck,et al.  Understanding the Complexity of Lifted Inference and Asymmetric Weighted Model Counting , 2014, StarAI@AAAI.

[2]  Christopher De Sa,et al.  Incremental Knowledge Base Construction Using DeepDive , 2015, Proceedings of the VLDB Endowment International Conference on Very Large Data Bases.

[3]  Raymond Reiter,et al.  A Logic for Default Reasoning , 1987, Artif. Intell..

[4]  David A. Ferrucci,et al.  Introduction to "This is Watson" , 2012, IBM J. Res. Dev..

[5]  Patrick J. Roa Volume 8 , 2001 .

[6]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[7]  Xinlei Chen,et al.  Never-Ending Learning , 2012, ECAI.

[8]  Christopher De Sa,et al.  Incremental Knowledge Base Construction Using DeepDive , 2015, The VLDB Journal.

[9]  Joseph Y. Halpern Reasoning about uncertainty , 2003 .

[10]  Fabio Gagliardi Cozman,et al.  Credal networks , 2000, Artif. Intell..

[11]  Noah A. Smith,et al.  Proceedings of NIPS , 2010, NIPS 2010.

[12]  Gerhard Weikum,et al.  YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia: Extended Abstract , 2013, IJCAI.

[13]  Luc De Raedt,et al.  Inducing Probabilistic Relational Rules from Probabilistic Examples , 2015, IJCAI.

[14]  Andrew McCallum,et al.  An Introduction to Conditional Random Fields , 2010, Found. Trends Mach. Learn..

[15]  Nils J. Nilsson,et al.  Artificial Intelligence , 1974, IFIP Congress.

[16]  Dan Suciu,et al.  The dichotomy of probabilistic inference for unions of conjunctive queries , 2012, JACM.

[17]  Haixun Wang,et al.  Probase: a probabilistic taxonomy for text understanding , 2012, SIGMOD Conference.

[18]  John Gill,et al.  Computational Complexity of Probabilistic Turing Machines , 1977, SIAM J. Comput..

[19]  Jason Weston,et al.  Learning Structured Embeddings of Knowledge Bases , 2011, AAAI.

[20]  Jack Minker,et al.  Logic and Data Bases , 1978, Springer US.

[21]  Guy Van den Broeck,et al.  Open World Probabilistic Databases (Extended Abstract) , 2016, Description Logics.

[22]  Guy Van den Broeck,et al.  Symmetric Weighted First-Order Model Counting , 2014, PODS.

[23]  William Yang Wang,et al.  Programming with personalized pagerank: a locally groundable first-order probabilistic logic , 2013, CIKM.

[24]  Dan Suciu,et al.  Journal of the ACM , 2006 .

[25]  Thomas Lukasiewicz,et al.  Ontology-Mediated Queries for Probabilistic Databases , 2017, AAAI.

[26]  Fabian M. Suchanek,et al.  AMIE: association rule mining under incomplete evidence in ontological knowledge bases , 2013, WWW.

[27]  Isaac Levi,et al.  The Enterprise Of Knowledge , 1980 .

[28]  Raymond Reiter On Closed World Data Bases , 1977, Logic and Data Bases.

[29]  Danqi Chen,et al.  Reasoning With Neural Tensor Networks for Knowledge Base Completion , 2013, NIPS.

[30]  John T. Gill,et al.  Computational complexity of probabilistic Turing machines , 1974, STOC '74.

[31]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[32]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[33]  Wei Zhang,et al.  Knowledge vault: a web-scale approach to probabilistic knowledge fusion , 2014, KDD.