Data Quality: Theory and Practice

Real-life data are often dirty: inconsistent, inaccurate, incomplete, stale and duplicated. Dirty data have been a longstanding issue, and the prevalent use of Internet has been increasing the risks, in an unprecedented scale, of creating and propagating dirty data. Dirty data are reported to cost US industry billions of dollars each year. There is no reason to believe that the scale of the problem is any different in any other society that depends on information technology. With these comes the need for improving data quality, a topic as important as traditional data management tasks for coping with the quantity of the data.

[1]  Bei Yu,et al.  On generating near-optimal tableaux for conditional functional dependencies , 2008, Proc. VLDB Endow..

[2]  Ahmed K. Elmagarmid,et al.  GDR: a system for guided data repair , 2010, SIGMOD Conference.

[3]  Neil Immerman,et al.  Recognizing patterns in streams with imprecise timestamps , 2013, Inf. Syst..

[4]  Divesh Srivastava,et al.  Truth Discovery and Copying Detection in a Dynamic World , 2009, Proc. VLDB Endow..

[5]  Carlo Batini,et al.  Data Quality: Concepts, Methodologies and Techniques , 2006, Data-Centric Systems and Applications.

[6]  Boris Otto,et al.  From Health Checks to the Seven Sisters: The Data Quality Journey at BT , 2009 .

[7]  Wenfei Fan,et al.  Dependencies revisited for improving data quality , 2008, PODS.

[8]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[9]  E. F. Codd,et al.  Relational Completeness of Data Base Sublanguages , 1972, Research Report / RJ / IBM / San Jose, California.

[10]  Wenfei Fan,et al.  Capturing missing tuples and missing values , 2010, PODS.

[11]  Houari Maaraj Houari Maaraj,et al.  ENTERPRISE INFORMATION PORTALS VS. ENTERPRISE KNOWLEDGE PORTALS , 2010, Dirassat Journal Economic Issue.

[12]  Helmut Seidl,et al.  Exact XML Type Checking in Polynomial Time , 2007, ICDT.

[13]  Shuai Ma,et al.  Improving Data Quality: Consistency and Accuracy , 2007, VLDB.

[14]  Jianzhong Li,et al.  Incremental Detection of Inconsistencies in Distributed Data , 2014, IEEE Trans. Knowl. Data Eng..

[15]  Jianzhong Li,et al.  The VLDB Journal manuscript No. (will be inserted by the editor) Dynamic Constraints for Record Matching , 2022 .

[16]  Lei Chen,et al.  Discovering matching dependencies , 2009, CIKM.

[17]  David Loshin Master Data Management , 2008 .

[18]  Renée J. Miller,et al.  Discovering data quality rules , 2008, Proc. VLDB Endow..

[19]  Rajeev Rastogi,et al.  A cost-based model and effective heuristic for repairing constraints by value modification , 2005, SIGMOD '05.

[20]  Shuai Ma,et al.  Detecting inconsistencies in distributed data , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[21]  Jie Liu,et al.  Propagating functional dependencies with conditions , 2008, VLDB 2008.

[22]  D. Holt,et al.  A Systematic Approach to Automatic Edit and Imputation , 1976 .

[23]  Wenfei Fan,et al.  Semandaq: a data quality system based on conditional functional dependencies , 2008, Proc. VLDB Endow..

[24]  Wenfei Fan,et al.  Conditional functional dependencies for capturing data inconsistencies , 2008, TODS.

[25]  Shuai Ma,et al.  Increasing the Expressivity of Conditional Functional Dependencies without Extra Complexity , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[26]  Felix Naumann,et al.  An Introduction to Duplicate Detection , 2010, An Introduction to Duplicate Detection.

[27]  Jan Chomicki,et al.  Consistent Query Answering: Five Easy Pieces , 2007, ICDT.

[28]  Divesh Srivastava,et al.  Integrating Conflicting Data: The Role of Source Dependence , 2009, Proc. VLDB Endow..

[29]  William E. Winkler,et al.  Data quality and record linkage techniques , 2007 .

[30]  Thomas Redman,et al.  The impact of poor data quality on the typical enterprise , 1998, CACM.

[31]  Jianzhong Li,et al.  Towards certain fixes with editing rules and master data , 2010, The VLDB Journal.

[32]  Shuai Ma,et al.  Interaction between Record Matching and Data Repairing , 2014, JDIQ.

[33]  Wenfei Fan,et al.  Relative information completeness , 2009, PODS.

[34]  Wenfei Fan,et al.  Integrity constraints for XML , 2000, PODS.

[35]  Leopoldo E. Bertossi,et al.  Database Repairing and Consistent Query Answering , 2011, Database Repairing and Consistent Query Answering.

[36]  Wenfei Fan,et al.  On XML integrity constraints in the presence of DTDs , 2001, JACM.

[37]  Ahmed K. Elmagarmid,et al.  Duplicate Record Detection: A Survey , 2007, IEEE Transactions on Knowledge and Data Engineering.

[38]  Jan Chomicki,et al.  Consistent query answers in inconsistent databases , 1999, PODS '99.

[39]  Shuai Ma,et al.  Extending Dependencies with Conditions , 2007, VLDB.

[40]  Wenfei Fan,et al.  Foundations of Data Quality Management , 2012, Foundations of Data Quality Management.

[41]  Jef Wijsen,et al.  Determining the currency of data , 2012 .

[42]  Donald W. Miller,et al.  Missing Prenatal Records at a Birth Center: A Communication Problem Quantified , 2005, AMIA.