Query-Answer Causality in Databases and Its Connections with Reverse Reasoning Tasks in Data and Knowledge Management

Causality is an important notion that appears at the foundations of many scientific disciplines, in the practice of technology, and also in our everyday life. Causality is crucial to understand and manage uncertainty in data, information, knowledge, and theories. In data management in particular, there is a need to represent, characterize and compute the causes that explain why certain query results are obtained or not, or why natural semantic conditions, such as integrity constraints, are not satisfied. The notion of query-answer causality in database was introduced in [86]. This notion is shown to be general enough to be applied to a broad class of database-related applications, such as explaining unexpected answers to a query result, diagnosing network malfunctions, data cleaning, hypothetical reasoning [86, 87, 84, 88]. In this thesis, we establish and investigate connections between query-answer causality and other important forms of reasoning that appear in data management and knowledge representation, e.g. consistency-based diagnoses [103], database repairs and consistent query answering [3], abductive diagnosis [35, 43], and the view-update problem [20, 77, 78]. These problems are classified in [83] as reverse data management problems. The unveiled relationships allow us to obtain new results for query-answer causality and also for the above mentioned related areas. Furthermore, we argue that causality in data management can be seen as a very fundamental concept, to which many other data management problems and notions are connected. In fact, we suggest causality as a unifying framework for reverse data management problems.

[1]  Jan Chomicki,et al.  Consistent query answers in inconsistent databases , 1999, PODS '99.

[2]  Ronald Fagin,et al.  Dichotomies in the Complexity of Preferred Repairs , 2015, PODS.

[3]  Peter Struss,et al.  Model-based Problem Solving , 2008, Handbook of Knowledge Representation.

[4]  Bert Van Nuffelen,et al.  Coherent Integration of Databases by Abductive Logic Programming , 2004, J. Artif. Intell. Res..

[5]  Marcelo Arenas,et al.  Composition and inversion of schema mappings , 2009, SGMD.

[6]  Jan Chomicki,et al.  Prioritized repairing and consistent query answering in relational databases , 2012, Annals of Mathematics and Artificial Intelligence.

[7]  Jörg Flum,et al.  Parameterized Complexity Theory , 2006, Texts in Theoretical Computer Science. An EATCS Series.

[8]  Raymond Reiter,et al.  Towards a Logical Reconstruction of Relational Database Theory , 1982, On Conceptual Modelling.

[9]  Marcelo Arenas,et al.  Relational and XML Data Exchange , 2010, Relational and XML Data Exchange.

[10]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[11]  Evgeny Sherkhonov,et al.  High-Level Why-Not Explanations using Ontologies , 2014, PODS.

[12]  David Poole,et al.  Normality and Faults in Logic-Based Diagnosis , 1989, IJCAI.

[13]  Jeffrey F. Naughton,et al.  On the provenance of non-answers to queries over extracted data , 2008, Proc. VLDB Endow..

[14]  Leopoldo E. Bertossi,et al.  Query-Answer Causality in Databases: Abductive Diagnosis and View Updates , 2015, ACI@UAI.

[15]  Jan Chomicki,et al.  Minimal-change integrity maintenance using tuple deletions , 2002, Inf. Comput..

[16]  Klaus R. Dittrich,et al.  Data Provenance: A Categorization of Existing Approaches , 2007, BTW.

[17]  Michael Gertz,et al.  Diagnosis and Repair of Constraint Violations in Database Systems , 1996, Datenbank Rundbr..

[18]  Georg Gottlob,et al.  Abduction from Logic Programs: Semantics and Complexity , 1997, Theor. Comput. Sci..

[19]  Bertram Ludäscher,et al.  First-Order Provenance Games , 2013, In Search of Elegance in the Theory and Practice of Computation.

[20]  Georg Gottlob,et al.  Tractable database design and datalog abduction through bounded treewidth , 2010, Inf. Syst..

[21]  Joseph Y. Halpern Defaults and Normality in Causal Structures , 2008, KR.

[22]  Sanjeev Khanna,et al.  Edinburgh Research Explorer On the Propagation of Deletions and Annotations through Views , 2013 .

[23]  Dan Suciu,et al.  WHY SO? or WHY NO? Functional Causality for Explaining Query Answers , 2009, MUD.

[24]  Quoc Trung Tran,et al.  How to ConQueR why-not questions , 2010, SIGMOD Conference.

[25]  Yogesh L. Simmhan,et al.  A survey of data provenance techniques , 2005 .

[26]  Jian Li,et al.  Sensitivity analysis and explanations for robust query evaluation in probabilistic databases , 2011, SIGMOD '11.

[27]  Mark W. Krentel The Complexity of Optimization Problems , 1988, J. Comput. Syst. Sci..

[28]  Judea Pearl,et al.  The International Journal of Biostatistics C AUSAL I NFERENCE An Introduction to Causal Inference , 2011 .

[29]  R. W. Wright Causation, Responsibility, Risk, Probability, Naked Statistics, and Proof: Pruning the Bramble Bush by Clarifying the Concepts, translated into Italian by Federico Stella , 1988 .

[30]  Margo I. Seltzer,et al.  Provenance: a future history , 2009, OOPSLA Companion.

[31]  Renée J. Miller,et al.  Provenance for Data Mining , 2013, TaPP.

[32]  James Cheney,et al.  Provenance in Databases: Why, How, and Where , 2009, Found. Trends Databases.

[33]  Xin He,et al.  Scalar aggregation in inconsistent databases , 2003, Theor. Comput. Sci..

[34]  Dan Suciu,et al.  Reverse data management , 2011, Proc. VLDB Endow..

[35]  R. W. Wright,et al.  Once More into the Bramble Bush: Duty, Causal Contribution, and the Extent of Legal Responsibility , 2001 .

[36]  Leopoldo E. Bertossi,et al.  Unifying Causality, Diagnosis, Repairs and View-Updates in Databases , 2014, ArXiv.

[37]  Ronald Fagin Inverting schema mappings , 2007 .

[38]  Gregory M. Provan,et al.  Approximate Model-Based Diagnosis Using Greedy Stochastic Search , 2010, J. Artif. Intell. Res..

[39]  Benny Kimelfeld,et al.  A dichotomy in the complexity of deletion propagation with functional dependencies , 2012, PODS '12.

[40]  Dan Suciu,et al.  A formal approach to finding explanations for database queries , 2014, SIGMOD Conference.

[41]  David Poole,et al.  Logic programming, abduction and probability , 1993, New Generation Computing.

[42]  Babak Salimi,et al.  From Causes for Database Queries to Repairs and Model-Based Diagnosis and Back , 2014, Theory of Computing Systems.

[43]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[44]  A. Honoré,et al.  Causation in the law , 1960 .

[45]  Rolf Niedermeier,et al.  An efficient fixed-parameter algorithm for 3-Hitting Set , 2003, J. Discrete Algorithms.

[46]  Letizia Tanca,et al.  Logic Programming and Databases , 1990, Surveys in Computer Science.

[47]  Peter Damaschke,et al.  The union of minimal hitting sets: Parameterized combinatorial bounds and counting , 2009, J. Discrete Algorithms.

[48]  Leopoldo E. Bertossi,et al.  Semantically Correct Query Answers in the Presence of Null Values , 2006, EDBT Workshops.

[49]  Wenfei Fan,et al.  Annotation propagation revisited for key preserving views , 2006, CIKM '06.

[50]  Joseph Y. Halpern Appropriate Causal Models and Stability of Causation , 2014, KR.

[51]  Jan Chomicki,et al.  Answer sets for consistent query answering in inconsistent databases , 2002, Theory and Practice of Logic Programming.

[52]  Sanjeev Khanna,et al.  Why and Where: A Characterization of Data Provenance , 2001, ICDT.

[53]  Val Tannen,et al.  Querying data provenance , 2010, SIGMOD Conference.

[54]  Leopoldo E. Bertossi,et al.  Characterizing and Computing Semantically Correct Answers from Databases with Annotated Logic and Answer Sets , 2001, Semantics in Databases.

[55]  Leopoldo E. Bertossi,et al.  Achieving Data Privacy through Secrecy Views and Null-Based Virtual Updates , 2011, IEEE Transactions on Knowledge and Data Engineering.

[56]  James Cheney,et al.  Is provenance logical? , 2011, LID '11.

[57]  Peter Damaschke,et al.  The union of minimal hitting sets: Parameterized combinatorial bounds and counting , 2007, J. Discrete Algorithms.

[58]  Wang Chiew Tan,et al.  Artemis: A System for Analyzing Missing Answers , 2009, Proc. VLDB Endow..

[59]  David Poole,et al.  Representing diagnosis knowledge , 1994, Annals of Mathematics and Artificial Intelligence.

[60]  Cong Yu,et al.  MRI: Meaningful Interpretations of Collaborative Ratings , 2011, Proc. VLDB Endow..

[61]  Georg Gottlob,et al.  Hypothesis Classification, Abductive Diagnosis and Therapy , 1990, Expert Systems in Engineering.

[62]  Igor Mozetic,et al.  Controlling the complexity in model-based diagnosis , 1994, Annals of Mathematics and Artificial Intelligence.

[63]  Sergio Greco,et al.  Certain Query Answering in Partially Consistent Databases , 2014, Proc. VLDB Endow..

[64]  Jennifer Widom,et al.  Tracing the lineage of view data in a warehousing environment , 2000, TODS.

[65]  Joseph Y. Halpern,et al.  Responsibility and Blame: A Structural-Model Approach , 2003, IJCAI.

[66]  Renée J. Miller,et al.  Reexamining Some Holy Grails of Data Provenance , 2011, TaPP.

[67]  Dan Suciu,et al.  Causality in Databases , 2010, IEEE Data Eng. Bull..

[68]  Peter Buneman,et al.  Provenance in databases , 2009, SIGMOD '07.

[69]  Neil Immerman A Characterization of the Complexity of Resilience and Responsibility for Conjunctive Queries , 2015 .

[70]  Dan Suciu,et al.  Bringing Provenance to Its Full Potential Using Causal Reasoning , 2011, TaPP.

[71]  Val Tannen Provenance Propagation in Complex Queries , 2013, In Search of Elegance in the Theory and Practice of Computation.

[72]  Tova Milo,et al.  QOCO: A Query Oriented Data Cleaning System with Oracles , 2015, Proc. VLDB Endow..

[73]  Joseph Y. Halpern,et al.  Causes and Explanations: A Structural-Model Approach. Part I: Causes , 2000, The British Journal for the Philosophy of Science.

[74]  Tobias Gerstenberg,et al.  Finding fault: Causality and counterfactuals in group attributions , 2012, Cognition.

[75]  Henning Fernau,et al.  Parameterized Approximation Algorithms for Hitting Set , 2011, WAOA.

[76]  Daniele Theseider Dupré,et al.  The role of abduction in database view updating , 1995, Journal of Intelligent Information Systems.

[77]  Leopoldo E. Bertossi,et al.  Complexity of Consistent Query Answering in Databases Under Cardinality-Based and Incremental Repair Semantics , 2006, ICDT.

[78]  Todd J. Green Containment of conjunctive queries on annotated relations , 2009, ICDT.

[79]  Samuel Madden,et al.  Scorpion: Explaining Away Outliers in Aggregate Queries , 2013, Proc. VLDB Endow..

[80]  Suman Nath,et al.  Tracing data errors with view-conditioned causality , 2011, SIGMOD '11.

[81]  Henning Fernau Parameterized algorithmics for d-Hitting Set , 2010, Int. J. Comput. Math..

[82]  Dan Suciu,et al.  Causality and Explanations in Databases , 2014, Proc. VLDB Endow..

[83]  Pietro Torasso,et al.  A spectrum of logical definitions of model‐based diagnosis 1 , 1991, Comput. Intell..

[84]  Dan Suciu,et al.  PerfXplain: Debugging MapReduce Job Performance , 2012, Proc. VLDB Endow..

[85]  Floris Geerts,et al.  Cell-based Causality for Data Repairs , 2015, TaPP.

[86]  Raymond Reiter,et al.  A Theory of Diagnosis from First Principles , 1986, Artif. Intell..

[87]  David Poole,et al.  A Logical Framework for Default Reasoning , 1988, Artif. Intell..

[88]  Joseph Y. Halpern A Modification of the Halpern-Pearl Definition of Causality , 2015, IJCAI.

[89]  Jan Vondrák,et al.  Maximizing Conjunctive Views in Deletion Propagation , 2012 .

[90]  G. Williams Causation in the Law , 1961, The Cambridge Law Journal.

[91]  Leopoldo E. Bertossi,et al.  Causality in Databases: The Diagnosis and Repair Connections , 2014, ArXiv.

[92]  Tobias Gerstenberg,et al.  Spreading the blame: The allocation of responsibility amongst multiple agents , 2010, Cognition.

[93]  Wolfgang Faber,et al.  The Diagnosis Frontend of the dlv System , 1999, AI Commun..

[94]  Michael S. Moore,et al.  Causation and Responsibility , 1999, Social Philosophy and Policy.

[95]  Georg Gottlob,et al.  The complexity of logic-based abduction , 1993, JACM.

[96]  Dan Suciu,et al.  The Complexity of Causality and Responsibility for Query Answers and non-Answers , 2010, Proc. VLDB Endow..

[97]  James Cheney,et al.  Causality and the Semantics of Provenance , 2010, DCM.

[98]  Pietro Torasso,et al.  On the Relationship between Abduction and Deduction , 1991, J. Log. Comput..

[99]  Val Tannen,et al.  Provenance for database transformations , 2008, EDBT '10.

[100]  Antonis C. Kakas,et al.  Abduction in Logic Programming , 2002, Computational Logic: Logic Programming and Beyond.

[101]  Diego Calvanese,et al.  Reasoning about Explanations for Negative Query Answers in DL-Lite , 2013, J. Artif. Intell. Res..

[102]  Diego Calvanese,et al.  Explanation in DL-Lite , 2008, Description Logics.

[103]  Paolo Mancarella,et al.  Database Updates through Abduction , 1990, VLDB.

[104]  Bertram Ludäscher,et al.  Towards Constraint Provenance Games , 2014, TAPP.

[105]  R. W. Wright,et al.  Causation in Tort Law , 1985 .

[106]  Bertram Ludäscher,et al.  Towards Constraint-based Explanations for Answers and Non-Answers , 2015, TaPP.

[107]  Phokion G. Kolaitis,et al.  Repair checking in inconsistent databases: algorithms and complexity , 2009, ICDT '09.

[108]  Joseph Y. Halpern Cause, Responsibility, and Blame: oA Structural-Model Approach , 2014, ArXiv.

[109]  Miroslaw Truszczynski,et al.  Answer set programming at a glance , 2011, Commun. ACM.

[110]  Jennifer Widom,et al.  Run-Time Translation of View Tuple Deletions Using Data Lineage , 2001 .

[111]  Michael Okun,et al.  On approximation of the vertex cover problem in hypergraphs , 2005, Discret. Optim..