Generic and Declarative Approaches to Data Quality Management

Data quality assessment and data cleaning tasks have traditionally been addressed through procedural solutions. Most of the time, those solutions have been applicable to specific problems and domains. In the last few years we have seen the emergence of more generic solutions, and also of declarative and rule-based specifications of the intended solutions of data cleaning processes. In this chapter we review some of those recent developments.

[1]  Walid G. Aref,et al.  Supporting views in data stream management systems , 2010, TODS.

[2]  Wenfei Fan,et al.  Conditional Functional Dependencies for Data Cleaning , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[3]  Filippo Furfaro,et al.  Querying and repairing inconsistent numerical databases , 2010, TODS.

[4]  Leopoldo E. Bertossi,et al.  Consistent query answering in databases , 2006, SGMD.

[5]  Jennifer Widom,et al.  Swoosh: a generic approach to entity resolution , 2008, The VLDB Journal.

[6]  Dennis Shasha,et al.  Declarative Data Cleaning: Language, Model, and Algorithms , 2001, VLDB.

[7]  Leopoldo E. Bertossi,et al.  Database Repairing and Consistent Query Answering , 2011, Database Repairing and Consistent Query Answering.

[8]  Giovambattista Ianni,et al.  An ASP System with Functions, Lists, and Sets , 2009, LPNMR.

[9]  Marios Hadjieleftheriou,et al.  Letter from the Special Issue Editor , 2009, IEEE Data Eng. Bull..

[10]  Carlo Batini,et al.  Data Quality: Concepts, Methodologies and Techniques (Data-Centric Systems and Applications) , 2006 .

[11]  Miroslaw Truszczynski,et al.  Answer set programming at a glance , 2011, Commun. ACM.

[12]  Wenguang Chen,et al.  Incorporating cardinality constraints and synonym rules into conditional functional dependencies , 2009, Inf. Process. Lett..

[13]  Ahmed K. Elmagarmid,et al.  Guided data repair , 2011, Proc. VLDB Endow..

[14]  Maurizio Lenzerini Ontology-based data management , 2011, CIKM '11.

[15]  Jef Wijsen,et al.  Database repairing using updates , 2005, TODS.

[16]  Lei Jiang,et al.  Data Quality Is Context Dependent , 2010, BIRTE.

[17]  Jianzhong Li,et al.  Towards certain fixes with editing rules and master data , 2010, The VLDB Journal.

[18]  Francesco Scarcello,et al.  Census Data Repair: a Challenging Application of Disjunctive Logic Programming , 2001, LPAR.

[19]  Andrea Calì,et al.  On the decidability and complexity of query answering over inconsistent and incomplete databases , 2003, PODS.

[20]  Shuai Ma,et al.  Interaction between Record Matching and Data Repairing , 2014, JDIQ.

[21]  Leopoldo E. Bertossi,et al.  The consistency extractor system: Answer set programs for consistent query answering in databases , 2010, Data Knowl. Eng..

[22]  Jean-Marie Nicolas,et al.  Logic for improving integrity checking in relational data bases⋆ , 1982, Acta Informatica.

[23]  Yinle Zhou,et al.  A Practical Guide to Entity Resolution with OYSTER , 2013, Handbook of Data Quality.

[24]  Dan Suciu,et al.  Letter from the Special Issue Editor , 2007, IEEE Data Eng. Bull..

[25]  Georg Gottlob,et al.  Complexity and expressive power of logic programming , 1997, Proceedings of Computational Complexity. Twelfth Annual IEEE Conference.

[26]  Leopoldo E. Bertossi,et al.  Multidimensional Contexts for Data Quality Assessment , 2012, AMW.

[27]  Per-Åke Larson,et al.  Updating derived relations: detecting irrelevant and autonomously computable updates , 1986, VLDB.

[28]  Leopoldo E. Bertossi,et al.  Semantically Correct Query Answers in the Presence of Null Values , 2006, EDBT Workshops.

[29]  Jan Chomicki,et al.  Answer sets for consistent query answering in inconsistent databases , 2002, Theory and Practice of Logic Programming.

[30]  Laks V. S. Lakshmanan,et al.  On approximating optimum repairs for functional dependency violations , 2009, ICDT '09.

[31]  Wenfei Fan,et al.  Conditional functional dependencies for capturing data inconsistencies , 2008, TODS.

[32]  Felix Naumann,et al.  An Introduction to Duplicate Detection , 2010, An Introduction to Duplicate Detection.

[33]  Peter Buneman,et al.  Using Powerdomains to Generalize Relational Databases , 1991, Theor. Comput. Sci..

[34]  Erhard Rahm,et al.  Data Cleaning: Problems and Current Approaches , 2000, IEEE Data Eng. Bull..

[35]  Matthias Jarke,et al.  Proceedings of the 20th International Conference on Very Large Data Bases , 1994 .

[36]  Ahmed K. Elmagarmid,et al.  Duplicate Record Detection: A Survey , 2007, IEEE Transactions on Knowledge and Data Engineering.

[37]  Hector Garcia-Molina,et al.  Generic entity resolution with negative rules , 2009, The VLDB Journal.

[38]  Jennifer Widom,et al.  Practical Applications of Triggers and Constraints: Success and Lingering Issues (10-Year Award) , 2000, VLDB.

[39]  Divesh Srivastava,et al.  Efficient and Effective Analysis of Data Quality using Pattern Tableaux , 2011, IEEE Data Eng. Bull..

[40]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[41]  E. F. CODD,et al.  A relational model of data for large shared data banks , 1970, CACM.

[42]  Rajeev Rastogi,et al.  A cost-based model and effective heuristic for repairing constraints by value modification , 2005, SIGMOD '05.

[43]  Jianzhong Li,et al.  Reasoning about Record Matching Rules , 2009, Proc. VLDB Endow..

[44]  D. Holt,et al.  A Systematic Approach to Automatic Edit and Imputation , 1976 .

[45]  Jan Chomicki,et al.  Consistent Query Answering: Five Easy Pieces , 2007, ICDT.

[46]  Brian Cooper Letter from the Special Issue Editor , 2011, IEEE Data Eng. Bull..

[47]  Renée J. Miller,et al.  Discovering data quality rules , 2008, Proc. VLDB Endow..

[48]  Laks V. S. Lakshmanan,et al.  Declarative Entity Resolution via Matching Dependencies and Answer Set Programs , 2012, KR.

[49]  David Maier,et al.  Letter from the Special Issue Editors , 2014, IEEE Data Eng. Bull..

[50]  Amihai Motro,et al.  Fusionplex: resolution of data inconsistencies in the integration of heterogeneous information sources , 2006, Inf. Fusion.

[51]  Alex Berson,et al.  Master Data Management and Data Governance , 2010 .

[52]  Iluju Kiringa,et al.  Matching dependencies: semantics and query answering , 2012, Frontiers of Computer Science.

[53]  Sergio Greco,et al.  A Logical Framework for Querying and Repairing Inconsistent Databases , 2003, IEEE Trans. Knowl. Data Eng..

[54]  Wenfei Fan,et al.  Dependencies revisited for improving data quality , 2008, PODS.

[55]  Christopher Ré,et al.  Large-Scale Deduplication with Constraints Using Dedupalog , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[56]  Sergio Greco,et al.  Active Integrity Constraints for Database Consistency Maintenance , 2009, IEEE Transactions on Knowledge and Data Engineering.

[57]  Andrea Calì,et al.  Datalog+/-: A Family of Logical Knowledge Representation and Query Languages for New Applications , 2010, 2010 25th Annual IEEE Symposium on Logic in Computer Science.

[58]  Bei Yu,et al.  On generating near-optimal tableaux for conditional functional dependencies , 2008, Proc. VLDB Endow..

[59]  Phokion G. Kolaitis,et al.  On the Data Complexity of Consistent Query Answering , 2012, ICDT '12.

[60]  Torsten Schaub,et al.  Proceedings of the 10th International Conference on Logic Programming and Nonmonotonic Reasoning , 2009 .

[61]  Shuai Ma,et al.  Improving Data Quality: Consistency and Accuracy , 2007, VLDB.

[62]  Rajeev Goré,et al.  A Logical Formalisation of the Fellegi-Holt Method of Data Cleaning , 2003, IDA.

[63]  Jianzhong Li,et al.  The VLDB Journal manuscript No. (will be inserted by the editor) Dynamic Constraints for Record Matching , 2022 .

[64]  Carlo Batini,et al.  Data Quality: Concepts, Methodologies and Techniques , 2006, Data-Centric Systems and Applications.

[65]  Thomas Eiter,et al.  Repair localization for query answering from inconsistent databases , 2008, TODS.

[66]  Shuai Ma,et al.  Extending Dependencies with Conditions , 2007, VLDB.

[67]  Leopoldo E. Bertossi,et al.  Tractable Cases of Clean Query Answering under Entity Resolution via Matching Dependencies , 2012, SUM.

[68]  Michael Gertz,et al.  Semantic integrity support in SQL:1999 and commercial (object-)relational database management systems , 2001, The VLDB Journal.

[69]  Leopoldo E. Bertossi,et al.  Characterizing and Computing Semantically Correct Answers from Databases with Annotated Logic and Answer Sets , 2001, Semantics in Databases.

[70]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[71]  Andrea Calì,et al.  Data integration under integrity constraints , 2004, Inf. Syst..

[72]  Surajit Chaudhuri,et al.  Maintenance of Materialized Views: Problems, Techniques, and Applications. , 1995 .

[73]  Shuai Ma,et al.  Increasing the Expressivity of Conditional Functional Dependencies without Extra Complexity , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[74]  Felix Naumann,et al.  Data fusion , 2009, CSUR.

[75]  Carlo Zaniolo,et al.  Non-Determinism in Deductive Databases , 1991, DOOD.

[76]  Leopoldo E. Bertossi,et al.  The complexity and approximation of fixing numerical attributes in databases under integrity constraints , 2008, Inf. Syst..

[77]  Laks V. S. Lakshmanan,et al.  Data Cleaning and Query Answering with Matching Dependencies and Matching Functions , 2010, ICDT '11.

[78]  Jan Chomicki,et al.  Consistent query answers in inconsistent databases , 1999, PODS '99.

[79]  Nathalie Pernelle,et al.  L2R: A Logical Method for Reference Reconciliation , 2007, AAAI.

[80]  Leopoldo E. Bertossi,et al.  Consistent Query Answers in Virtual Data Integration Systems , 2005, Inconsistency Tolerance.

[81]  Leopoldo E. Bertossi,et al.  Query Rewriting Using Datalog for Duplicate Resolution , 2012, Datalog.

[82]  Jean-Marie Nicolas,et al.  Logic for Improving Integrity Checking in Relational Data Bases , 1989 .

[83]  F. E. A Relational Model of Data Large Shared Data Banks , 2000 .