Explaining Missing Data in Graphs: A Constraint-based Approach

This paper introduces a constraint-based approach to clarify missing values in graphs. Our method capitalizes on a set Σ of graph data constraints. An explanation is a sequence of operational enforcement of Σ towards the recovery of interested yet missing data (e.g., attribute values, edges). We show that constraint-based approach helps us to understand not only why a value is missing, but also how to recover the missing value. We study Σ-explanation problem, which is to compute the optimal explanations with guarantees on the informativeness and conciseness. We show the problem is in $\Delta _2^P$ for established graph data constraints such as graph keys and graph association rules. We develop an efficient bidirectional algorithm to compute optimal explanations, without enforcing Σ on the entire graph. We also show our algorithm can be easily extended to support graph refinement within limited time, and to explain missing answers. Using real-world graphs, we experimentally verify the effectiveness and efficiency of our algorithms.

[1]  Wenfei Fan,et al.  Dependencies for Graphs , 2019, ACM J. Data Inf. Qual..

[2]  Hector Garcia-Molina,et al.  Pay-As-You-Go Entity Resolution , 2013, IEEE Transactions on Knowledge and Data Engineering.

[3]  Xin Wang,et al.  Association Rules with Graph Patterns , 2015, Proc. VLDB Endow..

[4]  Dan Suciu,et al.  WHY SO? or WHY NO? Functional Causality for Explaining Query Answers , 2009, MUD.

[5]  Neil D. Jones,et al.  An introduction to partial evaluation , 1996, CSUR.

[6]  Lise Getoor,et al.  Knowledge Graph Identification , 2013, SEMWEB.

[7]  Abdussalam Alawini,et al.  Provenance Analysis for Missing Answers and Integrity Repairs , 2018, IEEE Data Eng. Bull..

[8]  Yinghui Wu,et al.  Ontology-based Entity Matching in Attributed Graphs , 2019, Proc. VLDB Endow..

[9]  Nicole Immorlica,et al.  A Knapsack Secretary Problem with Applications , 2007, APPROX-RANDOM.

[10]  Vasilis Efthymiou,et al.  Benchmarking Blocking Algorithms for Web Entities , 2020, IEEE Transactions on Big Data.

[11]  Divesh Srivastava,et al.  Online Entity Resolution Using an Oracle , 2016, Proc. VLDB Endow..

[12]  Leonid Libkin Certain Answers Meet Zero-One Laws , 2018, PODS.

[13]  Heiko Paulheim,et al.  Knowledge graph refinement: A survey of approaches and evaluation methods , 2016, Semantic Web.

[14]  Chao Tian,et al.  Keys for Graphs , 2015, Proc. VLDB Endow..

[15]  Ping Lu,et al.  Dependencies for Graphs , 2017, PODS.

[16]  Andy Schürr,et al.  Incremental Graph Pattern Matching , 2006 .

[17]  Lei Chen,et al.  Rule-Based Graph Repairing: Semantic and Efficient Repairing Methods , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[18]  Jun Zhao,et al.  Knowledge Graph Embedding via Dynamic Mapping Matrix , 2015, ACL.

[19]  Yinghui Wu,et al.  Discovering Patterns for Fact Checking in Knowledge Graphs , 2019, ACM J. Data Inf. Qual..

[20]  Jens Lehmann,et al.  DBpedia SPARQL Benchmark - Performance Assessment with Real Queries on Real Data , 2011, SEMWEB.

[21]  Benny Kimelfeld,et al.  Detecting Ambiguity in Prioritized Database Repairing , 2017, ICDT.

[22]  Simon Razniewski,et al.  Predicting Completeness in Knowledge Bases , 2016, WSDM.

[23]  Bertram Ludäscher,et al.  Towards Constraint-based Explanations for Answers and Non-Answers , 2015, TaPP.

[24]  Wei Zhang,et al.  From Data Fusion to Knowledge Fusion , 2014, Proc. VLDB Endow..

[25]  Maria Pershina,et al.  Holistic entity matching across knowledge graphs , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[26]  Sanjeev Khanna,et al.  Why and Where: A Characterization of Data Provenance , 2001, ICDT.

[27]  Ping Lu,et al.  Deducing Certain Fixes to Graphs , 2019, Proc. VLDB Endow..

[28]  Fabian M. Suchanek,et al.  AMIE: association rule mining under incomplete evidence in ontological knowledge bases , 2013, WWW.

[29]  Giuseppe Polese,et al.  Relaxed Functional Dependencies—A Survey of Approaches , 2016, IEEE Transactions on Knowledge and Data Engineering.

[30]  Mohammad Hossein Namaki,et al.  Answering Why-questions by Exemplars in Attributed Graphs , 2019, SIGMOD Conference.

[31]  Lise Getoor,et al.  Entity Resolution in Graphs , 2005 .

[32]  Deeparnab Chakrabarty,et al.  Budget constrained bidding in keyword auctions and online knapsack problems , 2008, WINE.

[33]  Yinghui Wu,et al.  Functional Dependencies for Graphs , 2016, SIGMOD Conference.

[34]  Divesh Srivastava,et al.  Record linkage with uniqueness constraints and erroneous values , 2010, Proc. VLDB Endow..

[35]  Yinghui Wu,et al.  Discovering Graph Patterns for Fact Checking in Knowledge Graphs , 2018, DASFAA.

[36]  Nathalie Pernelle,et al.  VICKEY: Mining Conditional Keys on Knowledge Bases , 2017, SEMWEB.

[37]  Mario Vento,et al.  A (sub)graph isomorphism algorithm for matching large graphs , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Ping Lu,et al.  Extending Graph Patterns with Conditions , 2020, SIGMOD Conference.

[39]  Serge Abiteboul,et al.  Foundations of Databases: The Logical Level , 1995 .

[40]  Abdallah Arioua,et al.  User-guided Repairing of Inconsistent Knowledge Bases , 2018, EDBT.

[41]  Phokion G. Kolaitis,et al.  Repair checking in inconsistent databases: algorithms and complexity , 2009, ICDT '09.