Complexity and Approximation of Fixing Numerical Attributes in Databases Under Integrity Constraints

Consistent query answering is the problem of computing the answers from a database that are consistent with respect to certain integrity constraints that the database as a whole may fail to satisfy. Those answers are characterized as those that are invariant under minimal forms of restoring the consistency of the database. In this context, we study the problem of repairing databases by fixing integer numerical values at the attribute level with respect to denial and aggregation constraints. We introduce a quantitative definition of database fix, and investigate the complexity of several decision and optimization problems, including DFP, i.e. the existence of fixes within a given distance from the original instance, and CQA, i.e. deciding consistency of answers to aggregate conjunctive queries under different semantics. We provide sharp complexity bounds, identify relevant tractable cases; and introduce approximation algorithms for some of those that are intractable. More specifically, we obtain results like undecidability of existence of fixes for aggregation constraints; MAXSNP-hardness of DFP, but a good approximation algorithm for a relevant special case; and intractability but good approximation for CQA for aggregate queries for one database atom denials (plus built-ins).

[1]  D. Holt,et al.  A Systematic Approach to Automatic Edit and Imputation , 1976 .

[2]  Jan Chomicki,et al.  Consistent Query Answering: Five Easy Pieces , 2007, ICDT.

[3]  P. Stavropoulos,et al.  UNITED NATIONS STATISTICAL COMMISSION and ECONOMIC COMMISSION FOR EUROPE CONFERENCE OF EUROPEAN STATISTICIANS , 2001 .

[4]  Jan Chomicki,et al.  Minimal-change integrity maintenance using tuple deletions , 2002, Inf. Comput..

[5]  Francesco Scarcello,et al.  Census Data Repair: a Challenging Application of Disjunctive Logic Programming , 2001, LPAR.

[6]  Jan Chomicki,et al.  Query Answering in Inconsistent Databases , 2003, Logics for Emerging Applications of Databases.

[7]  Jef Wijsen,et al.  Condensed Representation of Database Repairs for Consistent Query Answering , 2003, ICDT.

[8]  L. Bertossi,et al.  Fixing Numerical Attributes Under Integrity Constraints , 2005 .

[9]  Renato Bruni,et al.  Error correction for massive datasets , 2005, Optim. Methods Softw..

[10]  Xin He,et al.  Scalar aggregation in inconsistent databases , 2003, Theor. Comput. Sci..

[11]  Jan Chomicki,et al.  Consistent query answers in inconsistent databases , 1999, PODS '99.

[12]  Vasek Chvátal,et al.  A Greedy Heuristic for the Set-Covering Problem , 1979, Math. Oper. Res..

[13]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[14]  Kenneth A. Ross,et al.  Foundations of Aggregation Constraints , 1994, PPCP.

[15]  Vijay V. Vazirani,et al.  Approximation Algorithms , 2001, Springer Berlin Heidelberg.

[16]  Carsten Lund,et al.  On the hardness of approximating minimization problems , 1994, JACM.

[17]  Qiming Chen,et al.  International Journal of Cooperative Information Systems , 1999 .

[18]  David S. Johnson,et al.  Some Simplified NP-Complete Graph Problems , 1976, Theor. Comput. Sci..

[19]  Rajeev Rastogi,et al.  A cost-based model and effective heuristic for repairing constraints by value modification , 2005, SIGMOD '05.

[20]  Filippo Furfaro,et al.  Consistent Query Answers on Numerical Databases Under Aggregate Constraints , 2005, DBPL.

[21]  Mihalis Yannakakis,et al.  Optimization, approximation, and complexity classes , 1991, STOC '88.

[22]  Leopoldo E. Bertossi,et al.  Consistent query answering in databases , 2006, SGMD.

[23]  Zhi-Zhong Chen,et al.  The complexity of selecting maximal solutions , 1993, [1993] Proceedings of the Eigth Annual Structure in Complexity Theory Conference.

[24]  Loreto Bravo,et al.  Efficient Approximation Algorithms for Repairing Inconsistent Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[25]  Rajeev Goré,et al.  A Logical Formalisation of the Fellegi-Holt Method of Data Cleaning , 2003, IDA.

[26]  Renée J. Miller,et al.  First-order query rewriting for inconsistent databases , 2005, J. Comput. Syst. Sci..

[27]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[28]  Filippo Furfaro,et al.  DART: A Data Acquisition and Repairing Tool , 2006, EDBT Workshops.

[29]  Jaikumar Radhakrishnan,et al.  Greed is good: Approximating independent sets in sparse and bounded-degree graphs , 1997, Algorithmica.

[30]  Mark W. Krentel The complexity of optimization problems , 1986, STOC '86.

[31]  Alberto O. Mendelzon,et al.  Merging Databases Under Constraints , 1998, Int. J. Cooperative Inf. Syst..

[32]  Julius T. Tou,et al.  Information Systems , 1973, GI Jahrestagung.

[33]  Gunter Saake,et al.  Logics for Emerging Applications of Databases , 2003, Springer Berlin Heidelberg.

[34]  Jef Wijsen,et al.  Making More Out of an Inconsistent Database , 2004, ADBIS.

[35]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[36]  Stéphane Grumbach,et al.  Constraint Databases , 1999, JFPLC.

[37]  Dorit S. Hochba,et al.  Approximation Algorithms for NP-Hard Problems , 1997, SIGA.

[38]  Andrea Calì,et al.  On the decidability and complexity of query answering over inconsistent and incomplete databases , 2003, PODS.