Tracking the Impact of Fact Deletions on Knowledge Graph Queries using Provenance Polynomials

Critical business applications in domains ranging from technical support to healthcare increasingly rely on large-scale, automatically constructed knowledge graphs. These applications use the results of complex queries over knowledge graphs in order to help users in taking crucial decisions such as which drug to administer, or whether certain actions are compliant with all the regulatory requirements and so on. However, these knowledge graphs constantly evolve, and the newer versions may adversely impact the results of queries that the previously taken business decisions were based on. We propose a framework based on provenance polynomials to track the impact of knowledge graph changes on arbitrary SPARQL query results. Focusing on the deletion of facts, we show how to efficiently determine the queries impacted by the change, develop ways to incrementally maintain these polynomials, and present an efficient implementation on top of RDF graph databases. Our experimental evaluation over large-scale RDF/SPARQL benchmarks show the effectiveness of our proposal.

[1]  Sanjeev Khanna,et al.  Why and Where: A Characterization of Data Provenance , 2001, ICDT.

[2]  Val Tannen,et al.  Querying data provenance , 2010, SIGMOD Conference.

[3]  Gerhard Weikum,et al.  YAGO2: exploring and querying world knowledge in time, space, context, and many languages , 2011, WWW.

[4]  Val Tannen,et al.  Update Exchange with Mappings and Provenance , 2007, VLDB.

[5]  James Cheney,et al.  Provenance in Databases: Why, How, and Where , 2009, Found. Trends Databases.

[6]  Vassilis Christophides,et al.  On Provenance of Queries on Semantic Web Data , 2011, IEEE Internet Computing.

[7]  Jennifer Widom,et al.  Lineage tracing in data warehouses , 2001 .

[8]  Wang Chiew Tan,et al.  SPIDER: a schema mapPIng DEbuggeR , 2006, VLDB.

[9]  Antonella Poggi,et al.  On database query languages for K-relations , 2010, J. Appl. Log..

[10]  Partha Pratim Talukdar,et al.  The ORCHESTRA Collaborative Data Sharing System , 2008, SIGMOD Rec..

[11]  Xinlei Chen,et al.  Never-Ending Learning , 2012, ECAI.

[12]  James Cheney,et al.  Provenance management in curated databases , 2006, SIGMOD Conference.

[13]  Daniel Deutch,et al.  Provenance for aggregate queries , 2011, PODS.

[14]  Gerhard Weikum,et al.  YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia: Extended Abstract , 2013, IJCAI.

[15]  Jeff Heflin,et al.  LUBM: A benchmark for OWL knowledge base systems , 2005, J. Web Semant..

[16]  Val Tannen,et al.  Provenance semirings , 2007, PODS.

[17]  Daniel Deutch,et al.  On the Limitations of Provenance for Queries with Difference , 2011, TaPP.

[18]  Gerhard Weikum,et al.  The RDF-3X engine for scalable management of RDF data , 2010, The VLDB Journal.