Properties of Inconsistency Measures for Databases

How should we quantify the inconsistency of a database that violates integrity constraints? Proper measures are important for various tasks, such as progress indication and action prioritization in cleaning systems, and reliability estimation for new datasets. To choose an appropriate inconsistency measure, it is important to identify the desired properties in the application and understand which of these is guaranteed or at least expected in practice. For example, in some use cases, the inconsistency should reduce if constraints are eliminated; in others, it should be stable and avoid jitters and jumps in reaction to small changes in the database. We embark on a systematic investigation of properties for database inconsistency measures. We investigate a collection of basic measures that have been proposed in the past in both the Knowledge Representation and Database communities, analyze their theoretical properties, and empirically observe their behavior in an experimental study. We also demonstrate how the framework can lead to new inconsistency measures by introducing a new measure that, in contrast to the rest, satisfies all of the properties we consider and can be computed in polynomial time.

[1]  Leopoldo E. Bertossi,et al.  Measuring and Computing Database Inconsistency via Repairs , 2018, SUM.

[2]  Wenfei Fan,et al.  Conditional functional dependencies for capturing data inconsistencies , 2008, TODS.

[3]  Ahmed K. Elmagarmid,et al.  NADEEF: A Generalized Data Cleaning System , 2013, Proc. VLDB Endow..

[4]  L. Khachiyan Polynomial algorithms in linear programming , 1980 .

[5]  Benny Kimelfeld,et al.  Computing Optimal Repairs for Functional Dependencies , 2020, ACM Trans. Database Syst..

[6]  Michael Stonebraker,et al.  Detecting Data Errors: Where are we and what needs to be done? , 2016, Proc. VLDB Endow..

[7]  John Grant,et al.  Inconsistency Measures for Relational Databases , 2019, ArXiv.

[8]  Alex Paul Conn,et al.  Time affordances: the time factor in diagnostic usability heuristics , 1995, CHI '95.

[9]  Roberto Grossi,et al.  Sublinear-Space Bounded-Delay Enumeration for Massive Network Analytics: Maximal Cliques , 2016, ICALP.

[10]  Brad A. Myers,et al.  The importance of percent-done progress indicators for computer-human interfaces , 1985, CHI '85.

[11]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[12]  Jan Chomicki,et al.  Consistent query answers in inconsistent databases , 1999, PODS '99.

[13]  Robert Bell,et al.  Rethinking the progress bar , 2007, UIST.

[14]  Anthony Hunter,et al.  Measuring Inconsistency through Minimal Inconsistent Sets , 2008, KR.

[15]  Benny Kimelfeld,et al.  Database Repairing with Soft Functional Dependencies , 2020, ArXiv.

[16]  Felix Naumann,et al.  Efficient Denial Constraint Discovery with Hydra , 2017, Proc. VLDB Endow..

[17]  V. S. Subrahmanian,et al.  How Dirty Is Your Relational Database? An Axiomatic Approach , 2007, ECSQARU.

[18]  Shuai Ma,et al.  Improving Data Quality: Consistency and Accuracy , 2007, VLDB.

[19]  Anthony Hunter,et al.  On the measure of conflicts: Shapley Inconsistency Values , 2010, Artif. Intell..

[20]  Catriel Beeri,et al.  The Implication Problem for Data Dependencies , 1981, ICALP.

[21]  Matthias Thimm,et al.  On the Compliance of Rationality Postulates for Inconsistency Measures: A More or Less Complete Picture , 2017, KI - Künstliche Intelligenz.

[22]  Zhiquan Yeo,et al.  Faster progress bars: manipulating perceived duration with visual augmentations , 2010, CHI.

[23]  Wenfei Fan,et al.  Conditional Functional Dependencies for Data Cleaning , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[24]  John Grant,et al.  Analysing inconsistent information using distance-based measures , 2017, Int. J. Approx. Reason..

[25]  Ihab F. Ilyas,et al.  Approximate Denial Constraints , 2020, Proc. VLDB Endow..

[26]  Jef Wijsen,et al.  Database repairing using updates , 2005, TODS.

[27]  John Grant,et al.  Measuring Consistency Gain and Information Loss in Stepwise Inconsistency Resolution , 2011, ECSQARU.

[28]  Jeffrey F. Naughton,et al.  Toward a progress indicator for database queries , 2004, SIGMOD '04.

[29]  Parke Godfrey,et al.  An overview of cooperative answering , 1992, Journal of Intelligent Information Systems.

[30]  Dana Ron,et al.  Property testing and its connection to learning and approximation , 1998, JACM.

[31]  Christopher Ré,et al.  The HoloClean Framework Dataset to be cleaned Denial Constraints External Information t 1 t 4 t 2 t 3 Johnnyo ’ s , 2017 .

[32]  Benny Kimelfeld,et al.  Counting and Enumerating (Preferred) Database Repairs , 2017, PODS.

[33]  Paolo Papotti,et al.  Discovering Denial Constraints , 2013, Proc. VLDB Endow..

[34]  Laks V. S. Lakshmanan,et al.  On approximating optimum repairs for functional dependency violations , 2009, ICDT '09.

[35]  Iluju Kiringa,et al.  Matching dependencies with arbitrary attribute values: semantics, query answering and integrity constraints , 2010, LID '11.

[36]  Paolo Papotti,et al.  Holistic data cleaning: Putting violations into context , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[37]  Phokion G. Kolaitis,et al.  Repair checking in inconsistent databases: algorithms and complexity , 2009, ICDT '09.

[38]  Mario Callegaro,et al.  Where Am I? A Meta-Analysis of Experiments on the Effects of Progress Indicators for Web Surveys , 2013 .

[39]  First-order query rewriting for inconsistent databases , 2007, J. Comput. Syst. Sci..

[40]  Leopoldo E. Bertossi,et al.  The complexity and approximation of fixing numerical attributes in databases under integrity constraints , 2008, Inf. Syst..

[41]  Felix Naumann,et al.  Discovery of Approximate (and Exact) Denial Constraints , 2019, Proc. VLDB Endow..

[42]  Madalina Croitoru,et al.  Inconsistency Measures for Repair Semantics in OBDA , 2018, IJCAI.

[43]  Jérôme Lang,et al.  Quantifying information and contradiction in propositional logic through test actions , 2003, IJCAI.

[44]  Benny Kimelfeld,et al.  The Shapley Value of Inconsistency Measures for Functional Dependencies , 2021, ICDT.

[45]  Paolo Papotti,et al.  Estimating Data Integration and Cleaning Effort , 2015, EDBT.

[46]  Leopoldo E. Bertossi,et al.  Complexity of Consistent Query Answering in Databases Under Cardinality-Based and Incremental Repair Semantics , 2006, ICDT.

[47]  Paolo Papotti,et al.  The LLUNATIC Data-Cleaning Framework , 2013, Proc. VLDB Endow..

[48]  John Grant,et al.  Measuring inconsistency in knowledgebases , 2006, Journal of Intelligent Information Systems.

[49]  Kevin M. Knight,et al.  Two Information Measures for Inconsistent Sets , 2003, J. Log. Lang. Inf..