Extending inclusion dependencies with conditions

This paper introduces a class of conditional inclusion dependencies (CINDs), which extends inclusion dependencies (INDs) by enforcing patterns of semantically related data values. We show that CINDs are useful not only in data cleaning, but also in contextual schema matching. We give a full treatment of the static analysis of CINDs, and show that CINDs retain most desired properties of traditional INDs: (a) CINDs are always satisfiable; (b) CINDs are finitely axiomatizable, i.e., there exists a sound and complete inference system for the implication analysis of CINDs; and (c) the implication problem for CINDs has the same complexity as its traditional counterpart, namely, PSPACE-complete, in the absence of attributes with a finite domain; but it is EXPTIME-complete in the general setting. In addition, we investigate the interaction between CINDs and conditional functional dependencies (CFDs), as well as two practical fragments of CINDs, namely acyclic CINDs and unary CINDs. We show the following: (d) the satisfiability problem for the combination of CINDs and CFDs becomes undecidable, even in the absence of finite-domain attributes; (e) in the absence of finite-domain attributes, the implication problem for acyclic CINDs and for unary CINDs retains the same complexity as its traditional counterpart, namely, NP-complete and PTIME, respectively; but in the general setting, it becomes PSPACE-complete and coNP-complete, respectively; and (f) the implication problem for acyclic unary CINDs remains in PTIME in the absence of finite-domain attributes and coNP-complete in the general setting.

[1]  Michael J. Maher,et al.  Chasing constrained tuple-generating dependencies , 1996, PODS.

[2]  Jan Chomicki,et al.  Answer sets for consistent query answering in inconsistent databases , 2002, Theory and Practice of Logic Programming.

[3]  Olivier Curé,et al.  Improving the Data Quality of Drug Databases using Conditional Dependencies and Ontologies , 2012, JDIQ.

[4]  Leopoldo E. Bertossi,et al.  Consistent query answering in databases , 2006, SGMD.

[5]  Jan Chomicki,et al.  Consistent Query Answering: Five Easy Pieces , 2007, ICDT.

[6]  Jan Chomicki,et al.  Minimal-change integrity maintenance using tuple deletions , 2002, Inf. Comput..

[7]  Jef Wijsen,et al.  Database repairing using updates , 2005, TODS.

[8]  Jianzhong Li,et al.  Towards certain fixes with editing rules and master data , 2010, The VLDB Journal.

[9]  Wenfei Fan,et al.  Conditional functional dependencies for capturing data inconsistencies , 2008, TODS.

[10]  Stavros S. Cosmadakis,et al.  Functional and inclusion dependencies a graph theoretic approach , 1984, PODS '84.

[11]  Edward Sciore,et al.  Comparing the Universal Instance and Relational Data Models , 1986, Adv. Comput. Res..

[12]  Jan Chomicki,et al.  Consistent query answers in inconsistent databases , 1999, PODS '99.

[13]  Felix Naumann,et al.  Discovering conditional inclusion dependencies , 2012, CIKM.

[14]  Shuai Ma,et al.  Increasing the Expressivity of Conditional Functional Dependencies without Extra Complexity , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[15]  Walter J. Savitch,et al.  Relationships Between Nondeterministic and Deterministic Tape Complexities , 1970, J. Comput. Syst. Sci..

[16]  Ronald Fagin,et al.  Inclusion dependencies and their interaction with functional dependencies , 1982, PODS.

[17]  Róbert Szelepcsényi The moethod of focing for nondeterministic automata , 1987, Bull. EATCS.

[18]  Seymour Ginsburg,et al.  On Completing Tables to Satisfy Functional Dependencies , 1985, Theor. Comput. Sci..

[19]  T. Mexia,et al.  Author ' s personal copy , 2009 .

[20]  Francesco Scarcello,et al.  Census Data Repair: a Challenging Application of Disjunctive Logic Programming , 2001, LPAR.

[21]  Wenguang Chen,et al.  Analyses and Validation of Conditional Dependencies with Built-in Predicates , 2009, DEXA.

[22]  Shuai Ma,et al.  Extending Dependencies with Conditions , 2007, VLDB.

[23]  J. van Leeuwen,et al.  Theoretical Computer Science , 2003, Lecture Notes in Computer Science.

[24]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[25]  Ronald Fagin,et al.  The Theory of Data Dependencies - An Overview , 1984, ICALP.

[26]  Alin Deutsch,et al.  Query reformulation with constraints , 2006, SGMD.

[27]  Renée J. Miller,et al.  Discovering data quality rules , 2008, Proc. VLDB Endow..

[28]  Wenfei Fan,et al.  Dependencies revisited for improving data quality , 2008, PODS.

[29]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[30]  Shuai Ma,et al.  Improving Data Quality: Consistency and Accuracy , 2007, VLDB.

[31]  Jianzhong Li,et al.  Incremental Detection of Inconsistencies in Distributed Data , 2014, IEEE Trans. Knowl. Data Eng..

[32]  Marcelo Arenas,et al.  Composition and inversion of schema mappings , 2009, SGMD.

[33]  Jianzhong Li,et al.  The VLDB Journal manuscript No. (will be inserted by the editor) Dynamic Constraints for Record Matching , 2022 .

[34]  Rajeev Rastogi,et al.  A cost-based model and effective heuristic for repairing constraints by value modification , 2005, SIGMOD '05.

[35]  Wenguang Chen,et al.  Incorporating cardinality constraints and synonym rules into conditional functional dependencies , 2009, Inf. Process. Lett..

[36]  David S. Johnson,et al.  Testing containment of conjunctive queries under functional and inclusion dependencies , 1982, J. Comput. Syst. Sci..

[37]  Bogdan S. Chlebus Domino-Tiling Games , 1986, J. Comput. Syst. Sci..

[38]  Phokion G. Kolaitis Schema mappings, data exchange, and metadata management , 2005, PODS '05.

[39]  Moshe Y. Vardi,et al.  Polynomial-time implication problems for unary inclusion dependencies , 1990, JACM.

[40]  Leopoldo E. Bertossi,et al.  Complexity of Consistent Query Answering in Databases Under Cardinality-Based and Incremental Repair Semantics , 2006, ICDT.

[41]  Ronald Fagin,et al.  Armstrong Databases for Functional and Inclusion Dependencies , 1983, Inf. Process. Lett..

[42]  Shuai Ma,et al.  Interaction between Record Matching and Data Repairing , 2014, JDIQ.

[43]  Christos H. Papadimitriou,et al.  Computational complexity , 1993 .

[44]  Bei Yu,et al.  On generating near-optimal tableaux for conditional functional dependencies , 2008, Proc. VLDB Endow..

[45]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[46]  Laura M. Haas,et al.  Clio grows up: from research prototype to industrial tool , 2005, SIGMOD '05.

[47]  Catriel Beeri,et al.  A Proof Procedure for Data Dependencies , 1984, JACM.

[48]  E. F. Codd,et al.  Relational Completeness of Data Base Sublanguages , 1972, Research Report / RJ / IBM / San Jose, California.

[49]  Wenfei Fan,et al.  Putting context into schema matching , 2006, VLDB.

[50]  Neil Immerman Nondeterministic Space is Closed Under Complementation , 1988, SIAM J. Comput..