Dichotomies in the Complexity of Preferred Repairs

The framework of database repairs provides a principled approach to managing inconsistencies in databases. Informally, a repair of an inconsistence database is a consistent database that differs from the inconsistent one in a "minimal way." A fundamental problem in this framework is the repair-checking problem: given two instances, is the second a repair of the first? Here, all repairs are taken into account, and they are treated on a par with each other. There are situations, however, in which it is natural and desired to prefer one repair over another; for example, one data source is regarded to be more reliable than another, or timestamp information implies that a more recent fact should be preferred over an earlier one. Motivated by these considerations, Staworko, Chomicki and Marcinkowski introduced the framework of preferred repairs. The main characteristic of this framework is that it uses a priority relation between conflicting facts of an inconsistent database to define notions of preferred repairs. In this paper we focus on the globally-optimal repairs, in the case where the constraints are functional dependencies. Intuitively, a globally-optimal repair is a repair that cannot be improved by exchanging facts with preferred facts. In this setting, it is known that there is a fixed schema (i.e., signature and functional dependencies) where globally-optimal repair-checking is coNP-complete. Our main result is a dichotomy in complexity: for each fixed relational signature and each fixed set of functional dependencies, the globally-optimal repair-checking problem either is solvable in polynomial time or is coNP-complete. Specifically, the problem is solvable in polynomial time if for each relation symbol in the signature, the functional dependencies are equivalent to either a single functional dependency or to a set of two key constraints; in all other cases, the globally-optimal repair-checking problem is coNP-complete. We also show that there is a polynomial-time algorithm for distinguishing between the tractable and the intractable cases. The setup of preferred repairs assumes that preferences are only between conflicting facts. In the last part of the paper, we investigate the effect of this assumption on the complexity of globally-optimal repair checking. With this assumption relaxed, we give another dichotomy theorem and another polynomial-time distinguishing algorithm. Interestingly, the two dichotomies turn out to have quite different conditions for distinguishing tractability from intractability.

[1]  Phokion G. Kolaitis,et al.  A dichotomy in the complexity of consistent query answering for queries with two atoms , 2012, Inf. Process. Lett..

[2]  Frederick Reiss,et al.  SystemT: An Algebraic Approach to Declarative Information Extraction , 2010, ACL.

[3]  Benny Kimelfeld,et al.  A dichotomy in the complexity of deletion propagation with functional dependencies , 2012, PODS '12.

[4]  David Maier,et al.  Testing implications of data dependencies , 1979, SIGMOD '79.

[5]  Dan Suciu,et al.  A Dichotomy on the Complexity of Consistent Query Answering for Atoms with Simple Keys , 2012, ICDT.

[6]  Jan Chomicki,et al.  Consistent query answers in inconsistent databases , 1999, PODS '99.

[7]  Jan Chomicki,et al.  Prioritized repairing and consistent query answering in relational databases , 2012, Annals of Mathematics and Artificial Intelligence.

[8]  Leopoldo E. Bertossi,et al.  Database Repairing and Consistent Query Answering , 2011, Database Repairing and Consistent Query Answering.

[9]  Phokion G. Kolaitis,et al.  On the Data Complexity of Consistent Query Answering , 2012, ICDT '12.

[10]  Phokion G. Kolaitis,et al.  Repair checking in inconsistent databases: algorithms and complexity , 2009, ICDT '09.

[11]  Frederick Reiss,et al.  Cleaning inconsistencies in information extraction via prioritized repairs , 2014, PODS.

[12]  Tomás Feder,et al.  The Computational Structure of Monotone Monadic SNP and Constraint Satisfaction: A Study through Datalog and Group Theory , 1999, SIAM J. Comput..

[13]  Gaëlle Fontaine,et al.  Why is it Hard to Obtain a Dichotomy for Consistent Query Answering? , 2013, 2013 28th Annual ACM/IEEE Symposium on Logic in Computer Science.

[14]  Frederick Reiss,et al.  Spanners: a formal framework for information extraction , 2013, PODS '13.

[15]  Douglas E. Appelt,et al.  The Common Pattern Specification Language , 1998, TIPSTER.