High-Level Why-Not Explanations using Ontologies

We propose a novel foundational framework for why-not explanations, that is, explanations for why a tuple is missing from a query result. Our why-not explanations leverage concepts from an ontology to provide high-level and meaningful reasons for why a tuple is missing from the result of a query. A key algorithmic problem in our framework is that of computing a most-general explanation for a why-not question, relative to an ontology, which can either be provided by the user, or it may be automatically derived from the data and/or schema. We study the complexity of this problem and associated problems, and present concrete algorithms for computing why-not explanations. In the case where an external ontology is provided, we first show that the problem of deciding the existence of an explanation to a why-not question is NP-complete in general. However, the problem is solvable in polynomial time for queries of bounded arity, provided that the ontology is specified in a suitable language, such as a member of the DL-Lite family of description logics, which allows for efficient concept subsumption checking. Furthermore, we show that a most-general explanation can be computed in polynomial time in this case. In addition, we propose a method for deriving a suitable (virtual) ontology from a database and/or a schema, and we present an algorithm for computing a most-general explanation to a why-not question, relative to such ontologies. This algorithm runs in polynomial-time in the case when concepts are defined in a selection-free language, or if the underlying schema is fixed. Finally, we also study the problem of computing short most-general explanations, and we briefly discuss alternative definitions of what it means to be an explanation, and to be most general.

[1]  John C. Mitchell The Implication Problem for Functional and Inclusion Dependencies , 1984, Inf. Control..

[2]  Wang Chiew Tan,et al.  Artemis: A System for Analyzing Missing Answers , 2009, Proc. VLDB Endow..

[3]  Val Tannen,et al.  Provenance semirings , 2007, PODS.

[4]  Melanie Herschel,et al.  Query-Based Why-Not Provenance with NedExplain , 2014, EDBT.

[5]  Moshe Y. Vardi,et al.  The Implication Problem for Functional and Inclusion Dependencies is Undecidable , 1985, SIAM J. Comput..

[6]  Oded Shmueli,et al.  Equivalence of DATALOG Queries is Undecidable , 1993, J. Log. Program..

[7]  Terry Halpin,et al.  LogiQL: A Query Language for Smart Databases , 2014 .

[8]  Sergio Tessaris,et al.  Automatic Extraction of Ontologies Wrapping Relational Data Sources , 2009, DEXA.

[9]  Todd J. Green LogiQL: A Declarative Language for Enterprise Applications , 2015, PODS.

[10]  Diego Calvanese,et al.  Reasoning about Explanations for Negative Query Answers in DL-Lite , 2013, J. Artif. Intell. Res..

[11]  Zahir Tari,et al.  On the Move to Meaningful Internet Systems. OTM 2018 Conferences , 2018, Lecture Notes in Computer Science.

[12]  Melanie Herschel,et al.  Explaining missing answers to SPJUA queries , 2010, Proc. VLDB Endow..

[13]  Georg Gottlob,et al.  The impact of virtual views on containment , 2010, Proc. VLDB Endow..

[14]  James Cheney,et al.  Provenance in Databases: Why, How, and Where , 2009, Found. Trends Databases.

[15]  Adriane Chapman,et al.  Why Not? , 1965, SIGMOD Conference.

[16]  Emir Pasalic,et al.  Design and Implementation of the LogicBlox System , 2015, SIGMOD Conference.

[17]  Diego Calvanese,et al.  Tractable Reasoning and Efficient Query Answering in Description Logics: The DL-Lite Family , 2007, Journal of Automated Reasoning.

[18]  Maurizio Lenzerini,et al.  Optimizing query rewriting in ontology-based data access , 2013, EDBT '13.

[19]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[20]  Jens Lehmann,et al.  Triplify: light-weight linked data publication from relational databases , 2009, WWW '09.

[21]  Todd J. Green,et al.  LogicBlox, Platform and Language: A Tutorial , 2012, Datalog.

[22]  SuciuDan,et al.  The complexity of causality and responsibility for query answers and non-answers , 2010, VLDB 2010.

[23]  Fernando Pereira,et al.  Yedalog: Exploring Knowledge at Scale , 2015, SNAPL.

[24]  Jeffrey F. Naughton,et al.  On Debugging Non-Answers in Keyword Search Systems , 2015, EDBT.

[25]  Divesh Srivastava,et al.  Explaining Program Execution in Deductive Systems , 1993, DOOD.

[26]  Dan Suciu,et al.  A formal approach to finding explanations for database queries , 2014, SIGMOD Conference.

[27]  Diego Calvanese,et al.  Linking Data to Ontologies , 2008, J. Data Semant..

[28]  Dan Suciu,et al.  The Complexity of Causality and Responsibility for Query Answers and non-Answers , 2010, Proc. VLDB Endow..

[29]  Jeffrey F. Naughton,et al.  On the provenance of non-answers to queries over extracted data , 2008, Proc. VLDB Endow..

[30]  Quoc Trung Tran,et al.  How to ConQueR why-not questions , 2010, SIGMOD Conference.

[31]  Diego Calvanese,et al.  The DL-Lite Family and Relations , 2009, J. Artif. Intell. Res..