Computing and Maintaining Provenance of Query Result Probabilities in Uncertain Knowledge Graphs

Knowledge graphs (KG) model relationships between entities as labeled edges (or facts). They are mostly constructed using a suite of automated extractors, thereby inherently leading to uncertainty in the extracted facts. Modeling the uncertainty as probabilistic confidence scores results in a probabilistic knowledge graph. Graph queries over such probabilistic KGs require answer computation along with the computation of result probabilities, i.e., probabilistic inference. We propose a system, HAPPI (How Provenance of Probabilistic Inference), to handle such query processing and inference. Complying with the standard provenance semiring model, we propose a novel commutative semiring to symbolically compute the probability of the result of a query. These provenance-polynomial-like symbolic expressions encode fine-grained information about the probability computation process. We leverage this encoding to efficiently compute as well as maintain probabilities of results even as the underlying KG changes. Focusing on conjunctive basic graph pattern queries, we observe that HAPPI is more efficient than knowledge compilation for answering commonly occurring queries with lower range of probability derivation complexity. We propose an adaptive system that leverages the strengths of both HAPPI and compilation based techniques, for not only to perform efficient probabilistic inference and compute their provenance, but also to incrementally maintain them.

[1]  Susanne E. Hambrusch,et al.  Orion 2.0: native support for uncertain data , 2008, SIGMOD Conference.

[2]  Guy Van den Broeck,et al.  Query Processing on Probabilistic Data: A Survey , 2017, Found. Trends Databases.

[3]  Pierre Senellart,et al.  ProvSQL: Provenance and Probability Management in PostgreSQL , 2018, Proc. VLDB Endow..

[4]  Christopher Ré,et al.  Efficient Top-k Query Evaluation on Probabilistic Data , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[5]  Antonella Poggi,et al.  On database query languages for K-relations , 2010, J. Appl. Log..

[6]  Dan Roth,et al.  On the Hardness of Approximate Reasoning , 1993, IJCAI.

[7]  Haixun Wang,et al.  Probase: a probabilistic taxonomy for text understanding , 2012, SIGMOD Conference.

[8]  Srikanta J. Bedathur,et al.  How and Why is An Answer (Still) Correct? Maintaining Provenance in Dynamic Knowledge Graphs , 2020, CIKM.

[9]  Norbert Fuhr,et al.  A probabilistic relational algebra for the integration of information retrieval and database systems , 1997, TOIS.

[10]  Paul T. Groth,et al.  TripleProv: efficient processing of lineage queries in a native RDF store , 2014, WWW.

[11]  Gerhard Weikum,et al.  YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia: Extended Abstract , 2013, IJCAI.

[12]  Dan Suciu,et al.  Dissociation and propagation for approximate lifted inference with standard relational database management systems , 2013, The VLDB Journal.

[13]  Wang Chiew Tan,et al.  Data Integration and Data Exchange: It's Really About Time , 2013, CIDR.

[14]  Dan Olteanu,et al.  MayBMS: a probabilistic database management system , 2009, SIGMOD Conference.

[15]  Adnan Darwiche,et al.  New Advances in Compiling CNF into Decomposable Negation Normal Form , 2004, ECAI.

[16]  Esteban Zimányi,et al.  Query Evaluation in Probabilistic Relational Databases , 1997, Theor. Comput. Sci..

[17]  Gerhard Weikum,et al.  YAGO2: exploring and querying world knowledge in time, space, context, and many languages , 2011, WWW.

[18]  Oren Etzioni,et al.  Identifying Relations for Open Information Extraction , 2011, EMNLP.

[19]  Sunita Sarawagi,et al.  Probabilistic Graphical Models and their Role in Databases , 2007, VLDB.

[20]  Val Tannen,et al.  Models for Incomplete and Probabilistic Information , 2006, IEEE Data Eng. Bull..

[21]  Pierre Marquis,et al.  A Knowledge Compilation Map , 2002, J. Artif. Intell. Res..

[22]  Sumit Sarkar,et al.  A probabilistic relational model and algebra , 1996, TODS.

[23]  Sanjeev Khanna,et al.  Why and Where: A Characterization of Data Provenance , 2001, ICDT.

[24]  Michael Pittarelli,et al.  The Theory of Probabilistic Databases , 1987, VLDB.

[25]  Jean-Marie Lagniez,et al.  An Improved Decision-DNNF Compiler , 2017, IJCAI.

[26]  M. Nivat,et al.  Selected papers from the international workshop on Uncertainty in databases and deductive systems , 1997 .

[27]  Arnab Bhattacharya,et al.  Tracking the Impact of Fact Deletions on Knowledge Graph Queries using Provenance Polynomials , 2017, CIKM.

[28]  Jennifer Widom,et al.  Exploiting Lineage for Confidence Computation in Uncertain and Probabilistic Databases , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[29]  Xinlei Chen,et al.  Never-Ending Learning , 2012, ECAI.

[30]  Parag Agrawal,et al.  Trio: a system for data, uncertainty, and lineage , 2006, VLDB.

[31]  Christian J. Muise,et al.  Dsharp: Fast d-DNNF Compilation with sharpSAT , 2012, Canadian Conference on AI.

[32]  Christopher Ré,et al.  MYSTIQ: a system for finding more answers by using probabilities , 2005, SIGMOD '05.

[33]  Val Tannen,et al.  Provenance semirings , 2007, PODS.

[34]  Gerhard Weikum,et al.  The RDF-3X engine for scalable management of RDF data , 2010, The VLDB Journal.

[35]  Richard M. Karp,et al.  Monte-Carlo algorithms for enumeration and reliability problems , 1983, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).

[36]  Dan Suciu,et al.  The dichotomy of conjunctive queries on probabilistic structures , 2006, PODS '07.

[37]  Richard Cyganiak,et al.  A relational algebra for SPARQL , 2005 .

[38]  Peter J. Haas,et al.  MCDB: a monte carlo approach to managing uncertain data , 2008, SIGMOD Conference.

[39]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[40]  Dan Olteanu,et al.  SPROUT: Lazy vs. Eager Query Plans for Tuple-Independent Probabilistic Databases , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[41]  Prithviraj Sen,et al.  Representing and Querying Correlated Tuples in Probabilistic Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.