Foundations of Data Quality Management

Data quality is one of the most important problems in data management. A database system typically aims to support the creation, maintenance, and use of large amount of data, focusing on the quantity of data. However, real-life data are often dirty: inconsistent, duplicated, inaccurate, incomplete, or stale. Dirty data in a database routinely generate misleading or biased analytical results and decisions, and lead to loss of revenues, credibility and customers. With this comes the need for data quality management. In contrast to traditional data management tasks, data quality management enables the detection and correction of errors in the data, syntactic or semantic, in order to improve the quality of the data and hence, add value to business processes. While data quality has been a longstanding problem for decades, the prevalent use of the Web has increased the risks, on an unprecedented scale, of creating and propagating dirty data. This monograph gives an overview of fundamental issues underlying central aspects of data quality, namely, data consistency, data deduplication, data accuracy, data currency, and information completeness. We promote a uniform logical framework for dealing with these issues, based on data quality rules. The text is organized into seven chapters, focusing on relational data. Chapter One introduces data quality issues. A conditional dependency theory is developed in Chapter Two, for capturing data inconsistencies. It is followed by practical techniques in Chapter 2b for discovering conditional dependencies, and for detecting inconsistencies and repairing data based on conditional dependencies. Matching dependencies are introduced in Chapter Three, as matching rules for data deduplication. A theory of relative information completeness is studied in Chapter Four, revising the classical Closed World Assumption and the Open World Assumption, to characterize incomplete information in the real world. A data currency model is presented in Chapter Five, to identify the current values of entities in a database and to answer queries with the current values, in the absence of reliable timestamps. Finally, interactions between these data quality issues are explored in Chapter Six. Important theoretical results and practical algorithms are covered, but formal proofs are omitted. The bibliographical notes contain pointers to papers in which the results were presented and proven, as well as references to materials for further reading. This text is intended for a seminar course at the graduate level. It is also to serve as a useful resource for researchers and practitioners who are interested in the study of data quality. The fundamental research on data quality draws on several areas, including mathematical logic, computational complexity and database theory. It has raised as many questions as it has answered, and is a rich source of questions and vitality. Table of Contents: Data Quality: An Overview / Conditional Dependencies / Cleaning Data with Conditional Dependencies / Data Deduplication / Information Completeness / Data Currency / Interactions between Data Quality Issues

[1]  Laks V. S. Lakshmanan,et al.  Data cleaning and query answering with matching dependencies and matching functions , 2011, ICDT '11.

[2]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[3]  Ronald Fagin,et al.  The Theory of Data Dependencies - An Overview , 1984, ICALP.

[4]  Floris Geerts,et al.  Static analysis of schema-mappings ensuring oblivious termination , 2010, ICDT '10.

[5]  Daniela Florescu,et al.  AJAX: An Extensible Data Cleaning Tool , 2000, SIGMOD Conference.

[6]  Divesh Srivastava,et al.  Efficient and Effective Analysis of Data Quality using Pattern Tableaux , 2011, IEEE Data Eng. Bull..

[7]  William E. Winkler,et al.  Methods for Record Linkage and Bayesian Networks , 2002 .

[8]  Ron van der Meyden The Complexity of Querying Indefinite Data about Linearly Ordered Domains , 1997, J. Comput. Syst. Sci..

[9]  Peter Christen,et al.  Febrl -: an open source data cleaning, deduplication and record linkage system with a graphical user interface , 2008, KDD.

[10]  Christopher Ré,et al.  Large-Scale Deduplication with Constraints Using Dedupalog , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[11]  Jianzhong Li,et al.  Towards certain fixes with editing rules and master data , 2010, Proc. VLDB Endow..

[12]  Donald W. Miller,et al.  Missing Prenatal Records at a Birth Center: A Communication Problem Quantified , 2005, AMIA.

[13]  Alexander Aiken,et al.  The Complexity of Set Constraints , 1993, CSL.

[14]  William W. Cohen,et al.  Learning to match and cluster large high-dimensional data sets for data integration , 2002, KDD.

[15]  Jianzhong Li,et al.  Incremental Detection of Inconsistencies in Distributed Data , 2012, IEEE Transactions on Knowledge and Data Engineering.

[16]  Rajeev Motwani,et al.  Robust and efficient fuzzy match for online data cleaning , 2003, SIGMOD '03.

[17]  Jinyan Li,et al.  Mining statistically important equivalence classes and delta-discriminative emerging patterns , 2007, KDD '07.

[18]  Leopoldo E. Bertossi,et al.  Consistent query answering in databases , 2006, SGMD.

[19]  Lawrence B. Holder,et al.  Mining Graph Data , 2006 .

[20]  Yannis Papakonstantinou,et al.  Object Fusion in Mediator Systems , 1996, VLDB.

[21]  Lei Chen,et al.  Discovering matching dependencies , 2009, CIKM.

[22]  Philip Giles,et al.  A model for generalized edit and imputation of survey data , 1988 .

[23]  Rosine Cicchetti,et al.  FUN: An Efficient Algorithm for Mining Functional and Embedded Dependencies , 2001, ICDT.

[24]  Jennifer Widom,et al.  Swoosh: a generic approach to entity resolution , 2008, The VLDB Journal.

[25]  Wiebren Zijlstra,et al.  Not All Is Lost: Old Adults Retain Flexibility in Motor Behaviour during Sit-to-Stand , 2013, PloS one.

[26]  Peter A. Flach,et al.  Database Dependency Discovery: A Machine Learning Approach , 1999, AI Commun..

[27]  Georg Gottlob,et al.  Closed World Databases Opened Through Null Values , 1988, VLDB.

[28]  Joseph M. Hellerstein,et al.  Improving data quality with dynamic forms , 2009, 2009 International Conference on Information and Communication Technologies and Development (ICTD).

[29]  Zhanhuai Li,et al.  Data deduplication techniques , 2010, 2010 International Conference on Future Information Technology and Management Engineering.

[30]  Richard Hull Finitely Specifiable Implicational Dependency Families , 1984, JACM.

[31]  Ran Raz,et al.  A sub-constant error-probability low-degree test, and a sub-constant error-probability PCP characterization of NP , 1997, STOC '97.

[32]  Ahmed K. Elmagarmid,et al.  Guided data repair , 2011, Proc. VLDB Endow..

[33]  Ahmed K. Elmagarmid,et al.  Ranking for data repairs , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[34]  William E. Winkler,et al.  Data quality and record linkage techniques , 2007 .

[35]  Jian Pei,et al.  CLOSET+: searching for the best strategies for mining frequent closed itemsets , 2003, KDD '03.

[36]  William E. Winkler,et al.  SET-COVERING AND EDITING DISCRETE DATA , 1998 .

[37]  Heikki Mannila,et al.  On the Complexity of Inferring Functional Dependencies , 1992, Discret. Appl. Math..

[38]  Toon Calders,et al.  Searching for dependencies at multiple abstraction levels , 2002, TODS.

[39]  Jef Wijsen,et al.  Determining the currency of data , 2012 .

[40]  Sergio Greco,et al.  Towards Relational Inconsistent Databases with Functional Dependencies , 2008, KES.

[41]  Jaideep Srivastava,et al.  Entity identification in database integration , 1993, Proceedings of IEEE 9th International Conference on Data Engineering.

[42]  Francesco Scarcello,et al.  Census Data Repair: a Challenging Application of Disjunctive Logic Programming , 2001, LPAR.

[43]  Anthony C. Klug Calculating constraints on relational expression , 1980, TODS.

[44]  Thomas Redman,et al.  The impact of poor data quality on the typical enterprise , 1998, CACM.

[45]  Christian S. Jensen,et al.  On the semantics of “now” in databases , 1996, TODS.

[46]  Jef Wijsen,et al.  Database repairing using updates , 2005, TODS.

[47]  Charles Elkan,et al.  Independence of logic database queries and update , 1990, PODS '90.

[48]  Andrea Calì,et al.  On the decidability and complexity of query answering over inconsistent and incomplete databases , 2003, PODS.

[49]  Serge Abiteboul,et al.  Complexity of answering queries using materialized views , 1998, PODS.

[50]  Shuai Ma,et al.  Detecting inconsistencies in distributed data , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[51]  Larry J. Stockmeyer,et al.  The Polynomial-Time Hierarchy , 1976, Theor. Comput. Sci..

[52]  Paul Brown,et al.  GORDIAN: efficient and scalable discovery of composite keys , 2006, VLDB.

[53]  Lhouari Nourine,et al.  A Unified Hierarchy for Functional Dependencies, Conditional Functional Dependencies and Association Rules , 2009, ICFCA.

[54]  Iluju Kiringa,et al.  Matching dependencies: semantics and query answering , 2012, Frontiers of Computer Science.

[55]  Sergio Greco,et al.  A Logical Framework for Querying and Repairing Inconsistent Databases , 2003, IEEE Trans. Knowl. Data Eng..

[56]  Boris Otto,et al.  From Health Checks to the Seven Sisters: The Data Quality Journey at BT , 2009 .

[57]  Wenfei Fan,et al.  Dependencies revisited for improving data quality , 2008, PODS.

[58]  Peter Z. Yeh,et al.  An Efficient and Robust Approach for Discovering Data Quality Rules , 2010, 2010 22nd IEEE International Conference on Tools with Artificial Intelligence.

[59]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[60]  Alon Y. Halevy,et al.  Data Integration for the Relational Web , 2009, Proc. VLDB Endow..

[61]  Junhu Wang,et al.  Binary equality implication constraints, normal forms and data redundancy , 2007, Inf. Process. Lett..

[62]  Phokion G. Kolaitis,et al.  Repair checking in inconsistent databases: algorithms and complexity , 2009, ICDT '09.

[63]  Divesh Srivastava,et al.  Truth Discovery and Copying Detection in a Dynamic World , 2009, Proc. VLDB Endow..

[64]  Carlo Batini,et al.  Data Quality: Concepts, Methodologies and Techniques , 2006, Data-Centric Systems and Applications.

[65]  Martin White,et al.  Enterprise information portals , 2000, Electron. Libr..

[66]  Salvatore J. Stolfo,et al.  The merge/purge problem for large databases , 1995, SIGMOD '95.

[67]  Divesh Srivastava,et al.  Sailing the Information Ocean with Awareness of Currents: Discovery and Application of Source Dependence , 2009, CIDR.

[68]  Surajit Chaudhuri,et al.  Example-driven design of efficient record matching queries , 2007, VLDB.

[69]  Richard W. Hamming,et al.  Error detecting and error correcting codes , 1950 .

[70]  Wenfei Fan,et al.  Capturing missing tuples and missing values , 2010, PODS.

[71]  Joseph M. Hellerstein,et al.  Potter's Wheel: An Interactive Data Cleaning System , 2001, VLDB.

[72]  Edward L. Robertson,et al.  FastFDs: A Heuristic-Driven, Depth-First Algorithm for Mining Functional Dependencies from Relation Instances - Extended Abstract , 2001, DaWaK.

[73]  Jennifer Widom,et al.  Trio: A System for Integrated Management of Data, Accuracy, and Lineage , 2004, CIDR.

[74]  Wenguang Chen,et al.  Analyses and Validation of Conditional Dependencies with Built-in Predicates , 2009, DEXA.

[75]  Musbah M. Aqel,et al.  CFD-Mine: An Efficient Algorithm For Discovering Functional and Conditional Functional Dependencies , 2012 .

[76]  Jan Chomicki,et al.  Temporal Databases , 2005, Handbook of Temporal Reasoning in Artificial Intelligence.

[77]  Shuai Ma,et al.  Extending Dependencies with Conditions , 2007, VLDB.

[78]  Leopoldo E. Bertossi,et al.  Tractable Cases of Clean Query Answering under Entity Resolution via Matching Dependencies , 2012, SUM.

[79]  Gunter Saake,et al.  Logics for databases and information systems , 1998 .

[80]  Ron van der Meyden,et al.  Logical Approaches to Incomplete Information: A Survey , 1998, Logics for Databases and Information Systems.

[81]  Philip S. Yu,et al.  Truth Discovery with Multiple Conflicting Information Providers on the Web , 2008, IEEE Trans. Knowl. Data Eng..

[82]  Per-Åke Larson,et al.  Updating derived relations: detecting irrelevant and autonomously computable updates , 1986, VLDB.

[83]  Peter Z. Yeh,et al.  Accelerating the Discovery of Data Quality Rules: A Case Study , 2011, IAAI.

[84]  Jian Pei,et al.  Minimum Description Length Principle: Generators Are Preferable to Closed Patterns , 2006, AAAI.

[85]  Anthony C. Klug,et al.  Determining View dependencies using tableaux , 1982, TODS.

[86]  Ronald Fagin,et al.  A normal form for relational databases that is based on domains and keys , 1981, TODS.

[87]  Jie Liu,et al.  Propagating functional dependencies with conditions , 2008, VLDB 2008.

[88]  Marc Spielmann,et al.  Abstract state machines: verification problems and complexity , 2000 .

[89]  Avishek Saha,et al.  Sequential Dependencies , 2009, Proc. VLDB Endow..

[90]  Leopoldo E. Bertossi,et al.  Query Rewriting Using Datalog for Duplicate Resolution , 2012, Datalog.

[91]  Catriel Beeri,et al.  A Proof Procedure for Data Dependencies , 1984, JACM.

[92]  E. F. Codd,et al.  Extending the database relational model to capture more meaning , 1979, ACM Trans. Database Syst..

[93]  Richard T. Snodgrass,et al.  Developing Time-Oriented Database Applications in SQL , 1999 .

[94]  Amihai Motro,et al.  Fusionplex: resolution of data inconsistencies in the integration of heterogeneous information sources , 2006, Inf. Fusion.

[95]  Jan Chomicki,et al.  Answer sets for consistent query answering in inconsistent databases , 2002, Theory and Practice of Logic Programming.

[96]  D. Holt,et al.  A Systematic Approach to Automatic Edit and Imputation , 1976 .

[97]  R. S. Garfinkel,et al.  Optimal Imputation of Erroneous Data: Categorical Data, General Edits , 1986, Oper. Res..

[98]  David Loshin Master Data Management , 2008 .

[99]  Ingo Wegener,et al.  Complexity theory - exploring the limits of efficient algorithms , 2005 .

[100]  Bart Goethals,et al.  Mining Association Rules of Simple Conjunctive Queries , 2008, SDM.

[101]  Georg Gottlob Computing covers for embedded functional dependencies , 1987, PODS '87.

[102]  Wenguang Chen,et al.  Incorporating cardinality constraints and synonym rules into conditional functional dependencies , 2009, Inf. Process. Lett..

[103]  L. Venkata Subramaniam,et al.  Data cleansing as a transient service , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[104]  Surajit Chaudhuri,et al.  Transformation-based Framework for Record Matching , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[105]  Alin Deutsch,et al.  Rewriting queries using views with access patterns under integrity constraints , 2005, Theor. Comput. Sci..

[106]  Rajeev Rastogi,et al.  A cost-based model and effective heuristic for repairing constraints by value modification , 2005, SIGMOD '05.

[107]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[108]  Renée J. Miller,et al.  Discovering data quality rules , 2008, Proc. VLDB Endow..

[109]  Sergio Greco,et al.  Querying and Repairing Inconsistent XML Data , 2005, WISE.

[110]  Sergio Greco,et al.  Incomplete Data and Data Dependencies in Relational Databases , 2012, Incomplete Data and Data Dependencies in Relational Databases.

[111]  Jennifer Widom,et al.  Active Database Systems: Triggers and Rules For Advanced Database Processing , 1994 .

[112]  Jan Chomicki,et al.  Consistent Query Answering: Five Easy Pieces , 2007, ICDT.

[113]  Werner Nutt,et al.  Completeness of queries over incomplete databases , 2011, Proc. VLDB Endow..

[114]  Hector Garcia-Molina,et al.  Entity resolution with evolving rules , 2010, Proc. VLDB Endow..

[115]  Claudio L. Lucchesi,et al.  Candidate Keys for Relations , 1978, J. Comput. Syst. Sci..

[116]  Erhard Rahm,et al.  Schema and ontology matching with COMA++ , 2005, SIGMOD '05.

[117]  Jean-Marc Petit,et al.  Efficient Discovery of Functional Dependencies and Armstrong Relations , 2000, EDBT.

[118]  Emanuel Santos,et al.  An Argumentation-based Approach to Database Repair , 2010, ECAI.

[119]  Noga Alon,et al.  The Probabilistic Method , 2015, Fundamentals of Ramsey Theory.

[120]  J. Wenny Rahayu,et al.  Discovering Conditional Functional Dependencies in XML Data , 2011, ADC.

[121]  Divesh Srivastava,et al.  Global detection of complex copying relationships between sources , 2010, Proc. VLDB Endow..

[122]  E. F. Codd,et al.  Relational Completeness of Data Base Sublanguages , 1972, Research Report / RJ / IBM / San Jose, California.

[123]  Serge Abiteboul,et al.  On the Representation and Querying of Sets of Possible Worlds , 1991, Theor. Comput. Sci..

[124]  Lukasz Golab,et al.  Sampling the repairs of functional dependency violations under hard constraints , 2010, Proc. VLDB Endow..

[125]  Alon Y. Levy Obtaining Complete Answers from Incomplete Databases , 1996, VLDB 1996.

[126]  Hector Garcia-Molina,et al.  Generic entity resolution with negative rules , 2009, The VLDB Journal.

[127]  Chengfei Liu,et al.  Discover Dependencies from Data—A Review , 2012, IEEE Transactions on Knowledge and Data Engineering.

[128]  Divesh Srivastava,et al.  Integrating Conflicting Data: The Role of Source Dependence , 2009, Proc. VLDB Endow..

[129]  Felix Naumann,et al.  Data Fusion in Three Steps: Resolving Schema, Tuple, and Value Inconsistencies , 2006, IEEE Data Eng. Bull..

[130]  Amélie Marian,et al.  A framework for corroborating answers from multiple web sources , 2011, Inf. Syst..

[131]  David Maier,et al.  The Theory of Relational Databases , 1983 .

[132]  Wenfei Fan,et al.  Putting context into schema matching , 2006, VLDB.

[133]  Manolis Koubarakis,et al.  Database models for infinite and indefinite temporal information , 1994, Inf. Syst..

[134]  Shuai Ma,et al.  Increasing the Expressivity of Conditional Functional Dependencies without Extra Complexity , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[135]  Maurice Bruynooghe,et al.  Towards a logical reconstruction of a theory for locally closed databases , 2010, TODS.

[136]  Felix Naumann,et al.  Data fusion , 2009, CSUR.

[137]  Sudipto Guha,et al.  Merging the Results of Approximate Match Operations , 2004, VLDB.

[138]  Nicolas Pasquier,et al.  Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.

[139]  George J. Klir,et al.  Fuzzy sets, uncertainty and information , 1988 .

[140]  Wenfei Fan,et al.  Conditional functional dependencies for capturing data inconsistencies , 2008, TODS.

[141]  Mohammed J. Zaki Mining Non-Redundant Association Rules , 2004, Data Min. Knowl. Discov..

[142]  Toon Calders,et al.  Non-derivable itemset mining , 2007, Data Mining and Knowledge Discovery.

[143]  Leopoldo E. Bertossi,et al.  The complexity and approximation of fixing numerical attributes in databases under integrity constraints , 2008, Inf. Syst..

[144]  Umeshwar Dayal,et al.  Processing Queries Over Generalization Hierarchies in a Multidatabase System , 1983, VLDB.

[145]  Alon Y. Halevy,et al.  Queries Independent of Updates , 1993, VLDB.

[146]  Laks V. S. Lakshmanan,et al.  On approximating optimum repairs for functional dependency violations , 2009, ICDT '09.

[147]  Victor Vianu Dynamic functional dependencies and database aging , 1987, JACM.

[148]  Felix Naumann,et al.  An Introduction to Duplicate Detection , 2010, An Introduction to Duplicate Detection.

[149]  Sergio Greco,et al.  Preferred repairs for inconsistent databases , 2003, Seventh International Database Engineering and Applications Symposium, 2003. Proceedings..

[150]  Tomasz Imielinski,et al.  Incomplete Information in Relational Databases , 1984, JACM.

[151]  Divesh Srivastava,et al.  Record linkage with uniqueness constraints and erroneous values , 2010, Proc. VLDB Endow..

[152]  Hannu Toivonen,et al.  Effective Pruning for the Discovery of Conditional Functional Dependencies , 2013, Comput. J..

[153]  Hannu Toivonen,et al.  TANE: An Efficient Algorithm for Discovering Functional and Approximate Dependencies , 1999, Comput. J..

[154]  Avishek Saha,et al.  Metric Functional Dependencies , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[155]  Heikki Mannila,et al.  Dependency Inference , 1987, VLDB.

[156]  E. F. Codd,et al.  Understanding Relations (Installment #7) , 1974, FDT Bull. ACM SIGFIDET SIGMOD.

[157]  Chen Li,et al.  Computing complete answers to queries in the presence of limited access patterns , 2003, The VLDB Journal.

[158]  Laks V. S. Lakshmanan,et al.  Exploiting Conflict Structures in Inconsistent Databases , 2010, ADBIS.

[159]  Joseph M. Hellerstein,et al.  USHER: Improving data quality with dynamic forms , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[160]  B. Dreben,et al.  The decision problem: Solvable classes of quantificational formulas , 1979 .

[161]  Victor Vianu,et al.  Views and queries: Determinacy and rewriting , 2010, TODS.

[162]  Ronald S. King,et al.  Discovery of functional and approximate functional dependencies in relational databases , 2003, Adv. Decis. Sci..

[163]  Antonio Sassano,et al.  Errors Detection and Correction in Large Scale Data Collecting , 2001, IDA.

[164]  Laks V. S. Lakshmanan,et al.  Declarative Entity Resolution via Matching Dependencies and Answer Set Programs , 2012, KR.

[165]  Jennifer Widom,et al.  Schema Design for Uncertain Databases , 2007, AMW.

[166]  Lawrence B. Holder,et al.  Mining Graph Data: Cook/Mining Graph Data , 2006 .

[167]  Lorenzo Blanco,et al.  Probabilistic Models to Reconcile Complex Data from Inaccurate Data Sources , 2010, CAiSE.

[168]  Lei Chen,et al.  Differential dependencies: Reasoning and discovery , 2011, TODS.

[169]  Wenfei Fan,et al.  Conditional Functional Dependencies for Data Cleaning , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[170]  Dennis Shasha,et al.  Declarative Data Cleaning: Language, Model, and Algorithms , 2001, VLDB.

[171]  Jan Chomicki,et al.  Consistent query answers in inconsistent databases , 1999, PODS '99.

[172]  Paul De Bra,et al.  Conditional Dependencies for Horizontal Decompositions , 1983, ICALP.

[173]  Christoph Koch,et al.  World-set decompositions: Expressiveness and efficient algorithms , 2007, Theor. Comput. Sci..

[174]  Ahmed K. Elmagarmid,et al.  TAILOR: a record linkage toolbox , 2002, Proceedings 18th International Conference on Data Engineering.

[175]  Nathalie Pernelle,et al.  L2R: A Logical Method for Reference Reconciliation , 2007, AAAI.

[176]  Seymour Ginsburg,et al.  On Completing Tables to Satisfy Functional Dependencies , 1985, Theor. Comput. Sci..

[177]  Alon Y. Halevy,et al.  Equivalence, query-reachability and satisfiability in Datalog extensions , 1993, PODS '93.

[178]  Surajit Chaudhuri,et al.  Leveraging aggregate constraints for deduplication , 2007, SIGMOD '07.

[179]  Ronald Fagin,et al.  Horn clauses and database dependencies , 1982, JACM.

[180]  Sunil Prabhakar,et al.  ERACER: a database approach for statistical inference and data cleaning , 2010, SIGMOD Conference.

[181]  Vilém Vychodil,et al.  Data Tables with Similarity Relations: Functional Dependencies, Complete Rules and Non-redundant Bases , 2006, DASFAA.

[182]  Manolis Koubarakis,et al.  The Complexity of Query Evaluation in Indefinite Temporal Constraint Databases , 1997, Theor. Comput. Sci..

[183]  Marianne Baudinet,et al.  Constraint-Generating Dependencies , 1994, J. Comput. Syst. Sci..

[184]  Amihai Motro,et al.  Integrity = validity + completeness , 1989, TODS.

[185]  Divesh Srivastava,et al.  Linking temporal records , 2011, Frontiers of Computer Science.

[186]  Christian S. Jensen,et al.  Now in Temporal Databases , 2009, Encyclopedia of Database Systems.

[187]  Bogdan S. Chlebus Domino-Tiling Games , 1986, J. Comput. Syst. Sci..

[188]  Wenfei Fan,et al.  Semandaq: a data quality system based on conditional functional dependencies , 2008, Proc. VLDB Endow..

[189]  Surajit Chaudhuri,et al.  Eliminating Fuzzy Duplicates in Data Warehouses , 2002, VLDB.

[190]  Paul Brown,et al.  CORDS: automatic discovery of correlations and soft functional dependencies , 2004, SIGMOD '04.

[191]  Xin Li,et al.  Constraint-Based Entity Matching , 2005, AAAI.

[192]  Ivan P. Fellegi,et al.  A Theory for Record Linkage , 1969 .

[193]  William W. Cohen WHIRL: A word-based information representation language , 2000, Artif. Intell..

[194]  Felix Naumann,et al.  Industry-scale duplicate detection , 2008, Proc. VLDB Endow..

[195]  Felix Naumann,et al.  Data Fusion – Resolving Data Conflicts for Integration , 2009 .

[196]  Salvatore J. Stolfo,et al.  Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem , 1998, Data Mining and Knowledge Discovery.

[197]  Shuai Ma,et al.  Improving Data Quality: Consistency and Accuracy , 2007, VLDB.

[198]  Rajeev Goré,et al.  A Logical Formalisation of the Fellegi-Holt Method of Data Cleaning , 2003, IDA.

[199]  Anuradha Bhamidipaty,et al.  Interactive deduplication using active learning , 2002, KDD.

[200]  Jianzhong Li,et al.  The VLDB Journal manuscript No. (will be inserted by the editor) Dynamic Constraints for Record Matching , 2022 .

[201]  Michael J. Maher,et al.  Chasing constrained tuple-generating dependencies , 1996, PODS.

[202]  William E. Winkler,et al.  Methods for evaluating and creating data quality , 2004, Inf. Syst..

[203]  Bart Goethals,et al.  Discovery and Application of Functional Dependencies in Conjunctive Query Mining , 2010, DaWak.

[204]  Jayant Madhavan,et al.  Reference reconciliation in complex information spaces , 2005, SIGMOD '05.

[205]  Erhard Rahm,et al.  Data Cleaning: Problems and Current Approaches , 2000, IEEE Data Eng. Bull..

[206]  Mohammed J. Zaki,et al.  Efficient algorithms for mining closed itemsets and their lattice structure , 2005, IEEE Transactions on Knowledge and Data Engineering.

[207]  Evgeny Dantsin,et al.  Complexity of Query Answering in Logic Databases with Complex Values , 1998 .

[208]  Moshe Y. Vardi On the integrity of databases with incomplete information , 1985, PODS.

[209]  Emanuel Santos,et al.  Support for User Involvement in Data Cleaning , 2011, DaWaK.

[210]  Ahmed K. Elmagarmid,et al.  Automating the approximate record-matching process , 2000, Inf. Sci..

[211]  Pedro M. Domingos,et al.  Object Identification with Attribute-Mediated Dependences , 2005, PKDD.

[212]  Renée J. Miller,et al.  A unified model for data and constraint repair , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[213]  Serge Abiteboul,et al.  Corroborating information from disagreeing views , 2010, WSDM '10.

[214]  Jianzhong Li,et al.  CerFix: A System for Cleaning Data with Certain Fixes , 2011, Proc. VLDB Endow..

[215]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[216]  Michael J. Maher Constrained Dependencies , 1995, Theor. Comput. Sci..

[217]  Felix Naumann,et al.  DogmatiX tracks down duplicates in XML , 2005, SIGMOD '05.

[218]  Neil Immerman,et al.  Recognizing patterns in streams with imprecise timestamps , 2010, Proc. VLDB Endow..

[219]  Seymour Ginsburg,et al.  Properties of functional-dependency families , 1982, JACM.

[220]  Matthew A. Jaro,et al.  Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida , 1989 .

[221]  Xi Zhang,et al.  Estimating the confidence of conditional functional dependencies , 2009, SIGMOD Conference.

[222]  Panagiotis G. Ipeirotis,et al.  Duplicate Record Detection: A Survey , 2007 .

[223]  E. F. Codd,et al.  A relational model of data for large shared data banks , 1970, CACM.

[224]  Bei Yu,et al.  On generating near-optimal tableaux for conditional functional dependencies , 2008, Proc. VLDB Endow..

[225]  Ahmed K. Elmagarmid,et al.  GDR: a system for guided data repair , 2010, SIGMOD Conference.

[226]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .