Handbook of Data Quality

The issue of data quality is as old as data itself. However, the proliferation of diverse, large-scale and often publically available data on the Web has increased the risk of poor data quality and misleading data interpretations. On the other hand, data is now exposed at a much more strategic level e.g. through business intelligence systems, increasing manifold the stakes involved for individuals, corporations as well as government agencies. There, the lack of knowledge about data accuracy, currency or completeness can have erroneous and even catastrophic results. With these changes, traditional approaches to data management in general, and data quality control specifically, are challenged. There is an evident need to incorporate data quality considerations into the whole data cycle, encompassing managerial/governance as well as technical aspects. Data quality experts from research and industry agree that a unified framework for data quality management should bring together organizational, architectural and computational approaches. Accordingly, Sadiq structured this handbook in four parts: Part I is on organizational solutions, i.e. the development of data quality objectives for the organization, and the development of strategies to establish roles, processes, policies, and standards required to manage and ensure data quality. Part II, on architectural solutions, covers the technology landscape required to deploy developed data quality management processes, standards and policies. Part III, on computational solutions, presents effective and efficient tools and techniques related to record linkage, lineage and provenance, data uncertainty, and advanced integrity constraints. Finally, Part IV is devoted to case studies of successful data quality initiatives that highlight the various aspects of data quality in action. The individual chapters present both an overview of the respective topic in terms of historical research and/or practice and state of the art, as well as specific techniques, methodologies and frameworks developed by the individual contributors. Researchers and students of computer science, information systems, or business management as well as data professionals and practitioners will benefit most from this handbook by not only focusing on the various sections relevant to their research area or particular practical work, but by also studying chapters that they may initially consider not to be directly relevant to them, as there they will learn about new perspectives and approaches.

[1]  S. Katz "All politics is local". , 1998, Connecticut medicine.

[2]  Catherine L. Wang,et al.  Dynamic Capabilities: A Review and Research Agenda , 2007 .

[3]  Thomas C. Redman,et al.  Data Quality: The Field Guide , 2001 .

[4]  Anany Levitin,et al.  Quality dimensions of a conceptual view , 1995 .

[5]  Jeffrey Davis,et al.  Continuous analytics over discontinuous streams , 2010, SIGMOD Conference.

[6]  Ahmed K. Elmagarmid,et al.  Duplicate Record Detection: A Survey , 2007, IEEE Transactions on Knowledge and Data Engineering.

[7]  Wenfei Fan,et al.  Conditional functional dependencies for capturing data inconsistencies , 2008, TODS.

[8]  A. Dan,et al.  Information as a Service: Modeling and Realization , 2007, International Workshop on Systems Development in SOA Environments (SDSOA'07: ICSE Workshops 2007).

[9]  Boris Otto,et al.  On the Evolution of Data Governance in Firms: The Case of Johnson & Johnson Consumer Products North America , 2013, Handbook of Data Quality.

[10]  Ralph Kimball,et al.  The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleaning, Conforming, and Delivering Data , 2004 .

[11]  Leopoldo E. Bertossi,et al.  Database Repairing and Consistent Query Answering , 2011, Database Repairing and Consistent Query Answering.

[12]  Dmitri V. Kalashnikov,et al.  Self-tuning in Graph-Based Reference Disambiguation , 2007, DASFAA.

[13]  Giovambattista Ianni,et al.  An ASP System with Functions, Lists, and Sets , 2009, LPNMR.

[14]  Matthew A. Jaro,et al.  Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida , 1989 .

[15]  Gonzalo Navarro,et al.  A guided tour to approximate string matching , 2001, CSUR.

[16]  Renée J. Miller,et al.  Very Large Databases , 1999 .

[17]  Felix Naumann,et al.  An Introduction to Duplicate Detection , 2010, An Introduction to Duplicate Detection.

[18]  Lise Getoor,et al.  Relational clustering for multi-type entity resolution , 2005, MRDM '05.

[19]  Richard Y. Wang,et al.  Journey to Data Quality , 2006 .

[20]  Donald E. Knuth,et al.  Sorting and Searching , 1973 .

[21]  Diane M. Strong,et al.  Beyond Accuracy: What Data Quality Means to Data Consumers , 1996, J. Manag. Inf. Syst..

[22]  Howard B. Newcombe,et al.  Record linkage: making maximum use of the discriminating power of identifying information , 1962, CACM.

[23]  John R. Talburt,et al.  Data Engineering: Mining, Information and Intelligence , 2009 .

[24]  Kevin Lane Keller,et al.  Effects of Quality and Quantity of Information on Decision Effectiveness , 1987 .

[25]  Wendy Samter,et al.  DECISION‐MAKING PROCEDURE AND DECISION QUALITY , 1984 .

[26]  Paul Hsiung,et al.  Alias Detection in Link Data Sets , 2004 .

[27]  Seev Neumann,et al.  DSS and Strategic Decisions , 1980 .

[28]  J. Barney Firm Resources and Sustained Competitive Advantage , 1991 .

[29]  Lukasz Golab,et al.  Data Stream Management , 2017, Data Stream Management.

[30]  Laks V. S. Lakshmanan,et al.  Declarative Entity Resolution via Matching Dependencies and Answer Set Programs , 2012, KR.

[31]  Mark Mosley,et al.  DAMA guide to the data management body of knowledge , 2010 .

[32]  William E. Winkler,et al.  Matching and record linkage , 2011 .

[33]  Francesco Scarcello,et al.  Census Data Repair: a Challenging Application of Disjunctive Logic Programming , 2001, LPAR.

[34]  Richard Y. Wang,et al.  Quality information and knowledge , 1998 .

[35]  William R. King,et al.  Integration between Business Planning and Information Systems Planning: An Evolutionary-Contingency Perspective , 1997, J. Manag. Inf. Syst..

[36]  Avrim Blum,et al.  Correlation Clustering , 2004, Machine Learning.

[37]  Jennifer Widom,et al.  Generic Entity Resolution in the SERF Project , 2006, IEEE Data Eng. Bull..

[38]  Sugato Basu,et al.  Adaptive product normalization: using online learning for record linkage in comparison shopping , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[39]  Martin Hepp,et al.  Swiqa - a semantic web information quality assessment framework , 2011, ECIS.

[40]  Theodore Johnson,et al.  Mining database structure; or, how to build a data quality browser , 2002, SIGMOD '02.

[41]  Gordon B. Davis,et al.  Can Humans Detect Errors in Data? Impact of Base Rates, Incentives, and Goals , 1997, MIS Q..

[42]  Leopoldo E. Bertossi,et al.  Complexity and Approximation of Fixing Numerical Attributes in Databases Under Integrity Constraints , 2005, DBPL.

[43]  Dan Suciu,et al.  Letter from the Special Issue Editor , 2007, IEEE Data Eng. Bull..

[44]  Henry A. Kautz,et al.  Hardening soft information sources , 2000, KDD '00.

[45]  M. Snowdon The Heart of Enterprise , 1979 .

[46]  Maurizio Lenzerini Ontology-based data management , 2011, CIKM '11.

[47]  Joseph M. Hellerstein,et al.  Quantitative Data Cleaning for Large Databases , 2008 .

[48]  Julio Gonzalo,et al.  WePS 2 Evaluation Campaign: Overview of the Web People Search Clustering Task , 2009 .

[49]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993, Knowl. Acquis..

[50]  E. F. Codd,et al.  Relational database: a practical foundation for productivity , 1982, CACM.

[51]  Jef Wijsen,et al.  Determining the currency of data , 2012 .

[52]  Dmitri V. Kalashnikov,et al.  Domain-independent data cleaning via analysis of entity-relationship graph , 2006, TODS.

[53]  Dennis Shasha,et al.  Declarative Data Cleaning: Language, Model, and Algorithms , 2001, VLDB.

[54]  R. Watson,et al.  Data Management , 1980, Bone Marrow Transplantation.

[55]  Theodore Johnson,et al.  Scalable Scheduling of Updates in Streaming Data Warehouses , 2012, IEEE Transactions on Knowledge and Data Engineering.

[56]  Divesh Srivastava,et al.  Linking temporal records , 2011, Frontiers of Computer Science.

[57]  Pedro M. Domingos Multi-Relational Record Linkage , 2003 .

[58]  Hubert Österle,et al.  Business Engineering Modell , 2003 .

[59]  Filippo Furfaro,et al.  Querying and repairing inconsistent numerical databases , 2010, TODS.

[60]  Leopoldo E. Bertossi,et al.  Consistent query answering in databases , 2006, SGMD.

[61]  Marta Indulska,et al.  Cross-disciplinary collaborations in data quality research , 2011, ECIS.

[62]  D. Funder,et al.  Information quantity and quality affect the realistic accuracy of personality judgment. , 2006, Journal of personality and social psychology.

[63]  Xiaowei Xu,et al.  SCAN: a structural clustering algorithm for networks , 2007, KDD '07.

[64]  Heiner Stuckenschmidt,et al.  Ontology-Based Integration of Information - A Survey of Existing Approaches , 2001, OIS@IJCAI.

[65]  Andreas Schaad,et al.  Privacy-preserving social network analysis for criminal investigations , 2008, WPES '08.

[66]  Debabrata Dey,et al.  Entity matching in heterogeneous databases: A logistic regression approach , 2008, Decis. Support Syst..

[67]  Kenneth L. Kraemer,et al.  Review: Information Technology and Organizational Performance: An Integrative Model of IT Business Value , 2004, MIS Q..

[68]  Larry P. English Improving Data Warehouse and Business Information Quality: Methods for Reducing Costs and Increasing Profits , 1999 .

[69]  Jane Fedorowicz,et al.  Governmental factors associated with state-wide interagency collaboration initiatives , 2010, DG.O.

[70]  Jennifer Widom,et al.  Swoosh: a generic approach to entity resolution , 2008, The VLDB Journal.

[71]  Renée J. Miller,et al.  Discovering data quality rules , 2008, Proc. VLDB Endow..

[72]  Hector Garcia-Molina,et al.  Entity resolution with evolving rules , 2010, Proc. VLDB Endow..

[73]  Marta Indulska,et al.  Research and industry synergies in data quality management , 2011, ICIQ.

[74]  Per-Åke Larson,et al.  Updating derived relations: detecting irrelevant and autonomously computable updates , 1986, VLDB.

[75]  W. A. Beyer,et al.  Some Biological Sequence Metrics , 1976 .

[76]  Eric J. Johnson,et al.  Adaptive Strategy Selection in Decision Making. , 1988 .

[77]  Helena Galhardas,et al.  A Taxonomy of Data Quality Problems , 2005 .

[78]  Bradley Malin,et al.  Unsupervised Name Disambiguation via Social Network Similarity , 2005 .

[79]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[80]  Wenguang Chen,et al.  Incorporating cardinality constraints and synonym rules into conditional functional dependencies , 2009, Inf. Process. Lett..

[81]  Li Ma,et al.  SMDM: Enhancing Enterprise-Wide Master Data Management Using Semantic Web Technologies , 2009, Proc. VLDB Endow..

[82]  Lukasz Golab,et al.  Sampling the repairs of functional dependency violations under hard constraints , 2010, Proc. VLDB Endow..

[83]  Thomas Aden,et al.  Ontology Based Data Validation and Cleaning: Restructuring Operations for Ontology Maintenance , 2007, GI Jahrestagung.

[84]  Constance E. Helfat,et al.  The dynamic resource-based view: capability lifecycles , 2003 .

[85]  L. O'Donnell Don't take it for granted , 2001 .

[86]  Richard Y. Wang,et al.  Anchoring data quality dimensions in ontological foundations , 1996, CACM.

[87]  Carlo Batini,et al.  Data Quality: Concepts, Methodologies and Techniques , 2006, Data-Centric Systems and Applications.

[88]  Alex Berson,et al.  Master Data Management And Customer Data Integration For A Global Enterprise , 2007 .

[89]  J. A. Vayghan,et al.  The internal information of IBM , 2007 .

[90]  Jyrki Nummenmaa,et al.  Ontologies with Semantic Web/Grid in Data Integration for OLAP , 2007, Int. J. Semantic Web Inf. Syst..

[91]  Heikki Mannila,et al.  Approximate Inference of Functional Dependencies from Relations , 1995, Theor. Comput. Sci..

[92]  C. Blaschke,et al.  The frame-based module of the SUISEKI information extraction system , 2002 .

[93]  Leopoldo E. Bertossi,et al.  The consistency extractor system: Answer set programs for consistent query answering in databases , 2010, Data Knowl. Eng..

[94]  José Barateiro,et al.  A Survey of Data Quality Tools , 2005, Datenbank-Spektrum.

[95]  Yinle Zhou,et al.  A Practical Guide to Entity Resolution with OYSTER , 2013, Handbook of Data Quality.

[96]  Adir Even,et al.  Utility Cost Perspectives in Data Quality Management , 2009, J. Comput. Inf. Syst..

[97]  Jeffrey D. Simon,et al.  A Theoretical Perspective on Political Risk , 1984 .

[98]  Ahmed K. Elmagarmid,et al.  On the Accuracy and Completeness of the Record Matching Process , 2000, IQ.

[99]  Wolfgang Breuer,et al.  X, Y, Z , 2003 .

[100]  Danette McGilvray,et al.  Executing Data Quality Projects: Ten Steps to Quality Data and Trusted Information TM , 2008 .

[101]  Tony Fisher,et al.  The Data Asset: How Smart Companies Govern Their Data for Business Success , 2009 .

[102]  Alberto Anguita,et al.  OntoDataClean: Ontology-Based Integration and Preprocessing of Distributed Data , 2006, ISBMDA.

[103]  E. F. Codd,et al.  The Relational Model for Database Management, Version 2 , 1990 .

[104]  Wenfei Fan,et al.  Semandaq: a data quality system based on conditional functional dependencies , 2008, Proc. VLDB Endow..

[105]  Surajit Chaudhuri,et al.  Eliminating Fuzzy Duplicates in Data Warehouses , 2002, VLDB.

[106]  Ivan P. Fellegi,et al.  A Theory for Record Linkage , 1969 .

[107]  Alexandros Labrinidis,et al.  Update Propagation Strategies for Improving the Quality of Data on the Web , 2001, VLDB.

[108]  Xiaojun Zhang,et al.  Transitive Closure of Data Records: Application and Computation , 2009 .

[109]  Wendy G. Lehnert,et al.  Information extraction , 1996, CACM.

[110]  Leopoldo E. Bertossi,et al.  Multidimensional Contexts for Data Quality Assessment , 2012, AMW.

[111]  Philip M. Marcus,et al.  The Visible Hand: The Managerial Revolution in American Business , 1979 .

[112]  Richard L. Nolan,et al.  Managing the computer resource , 1973, Commun. ACM.

[113]  Anany Levitin,et al.  The Notion of Data and Its Quality Dimensions , 1994, Inf. Process. Manag..

[114]  Andrew McCallum,et al.  Toward Conditional Models of Identity Uncertainty with Application to Proper Noun Coreference , 2003, IIWeb.

[115]  Leopoldo E. Bertossi,et al.  Semantically Correct Query Answers in the Presence of Null Values , 2006, EDBT Workshops.

[116]  Izak Benbasat,et al.  The Case Research Strategy in Studies of Information Systems , 1987, MIS Q..

[117]  Salvatore J. Stolfo,et al.  The merge/purge problem for large databases , 1995, SIGMOD '95.

[118]  Jennifer Widom,et al.  Practical Applications of Triggers and Constraints: Success and Lingering Issues (10-Year Award) , 2000, VLDB.

[119]  H B NEWCOMBE,et al.  Automatic linkage of vital records. , 1959, Science.

[120]  Richard Y. Wang,et al.  Manage Your Information as a Product , 1998 .

[121]  Jae Hong Park,et al.  A Data Quality Management Maturity Model , 2006 .

[122]  Thomas Eiter,et al.  Repair localization for query answering from inconsistent databases , 2008, TODS.

[123]  Pedro Rangel Henriques,et al.  A Formal Definition of Data Quality Problems , 2005, ICIQ.

[124]  William E. Winkler,et al.  Data quality and record linkage techniques , 2007 .

[125]  Thomas Redman,et al.  The impact of poor data quality on the typical enterprise , 1998, CACM.

[126]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[127]  Donald P. Ballou,et al.  Modeling Data and Process Quality in Multi-Input, Multi-Output Information Systems , 1985 .

[128]  William E. Winkler,et al.  The State of Record Linkage and Current Research Problems , 1999 .

[129]  Divesh Srivastava,et al.  Data Auditor , 2010, Proc. VLDB Endow..

[130]  Amihai Motro,et al.  Fusionplex: resolution of data inconsistencies in the integration of heterogeneous information sources , 2006, Inf. Fusion.

[131]  Mouzhi Ge,et al.  Cost and Value Management for Data Quality , 2013, Handbook of Data Quality.

[132]  Alex Berson,et al.  Master Data Management and Data Governance , 2010 .

[133]  Željko Panian,et al.  Some Practical Experiences in Data Governance , 2010 .

[134]  Robert E. Tarjan,et al.  Graph Clustering and Minimum Cut Trees , 2004, Internet Math..

[135]  Eugene Agichtein,et al.  Mining reference tables for automatic text segmentation , 2004, KDD.

[136]  Jef Wijsen,et al.  Database repairing using updates , 2005, TODS.

[137]  Carol V. Brown,et al.  Designing data governance , 2010, CACM.

[138]  Ola Svenson,et al.  Choices and judgments of incompletely described decision alternatives under time pressure , 1990 .

[139]  K. Eisenhardt Building theories from case study research , 1989, STUDI ORGANIZZATIVI.

[140]  David Yarowsky,et al.  Unsupervised Personal Name Disambiguation , 2003, CoNLL.

[141]  Lei Jiang,et al.  Data Quality Is Context Dependent , 2010, BIRTE.

[142]  Jianzhong Li,et al.  Towards certain fixes with editing rules and master data , 2010, The VLDB Journal.

[143]  Andrea Calì,et al.  On the decidability and complexity of query answering over inconsistent and incomplete databases , 2003, PODS.

[144]  Dmitri V. Kalashnikov,et al.  Adaptive graphical approach to entity resolution , 2007, JCDL '07.

[145]  Boris Otto,et al.  A Contingency Approach To Data Governance , 2007, ICIQ.

[146]  Richard Y. Wang,et al.  A product perspective on total data quality management , 1998, CACM.

[147]  S. Russell Identity Uncertainty , 2010, Encyclopedia of Machine Learning.

[148]  Tor Guimaraes,et al.  Managing Organizational Data Resources: Quality Dimensions , 2000, Inf. Resour. Manag. J..

[149]  David Edge,et al.  Reinventing the Wheel , 1995 .

[150]  Avishek Saha,et al.  Sequential Dependencies , 2009, Proc. VLDB Endow..

[151]  Leopoldo E. Bertossi,et al.  Query Rewriting Using Datalog for Duplicate Resolution , 2012, Datalog.

[152]  Theodore Johnson,et al.  Stream warehousing with DataDepot , 2009, SIGMOD Conference.

[153]  Stuart E. Madnick,et al.  Overview and Framework for Data and Information Quality Research , 2009, JDIQ.

[154]  Peter J. Haas,et al.  Techniques for Warehousing of Sample Data , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[155]  Martin Hepp,et al.  Towards a vocabulary for data quality management in semantic web architectures , 2011, LWDM '11.

[156]  Terrence A. Brooks,et al.  World Wide Web Consortium (W3C) , 2010 .

[157]  Erkki Sutinen,et al.  On Using q-Gram Locations in Approximate String Matching , 1995, ESA.

[158]  Michael Uschold,et al.  Ontologies: principles, methods and applications , 1996, The Knowledge Engineering Review.

[159]  Thomas H. Davenport,et al.  Process Innovation: Reengineering Work Through Information Technology , 1992 .

[160]  Nigel W. Horne,et al.  Information as an asset—The board agenda , 1995 .

[161]  Richard Y. Wang,et al.  Data Quality , 2000, Advances in Database Systems.

[162]  Lukasz Golab,et al.  Towards benchmarking stream data warehouses , 2012, DOLAP '12.

[163]  Frank van Harmelen,et al.  A semantic web primer , 2004 .

[164]  Miroslaw Truszczynski,et al.  Answer set programming at a glance , 2011, Commun. ACM.

[165]  Niv Ahituv,et al.  The Effects of Time Pressure and Completeness of Information on Decision Making , 1998, J. Manag. Inf. Syst..

[166]  Martin Hepp,et al.  Using SPARQL and SPIN for Data Quality Management on the Semantic Web , 2010, BIS.

[167]  Laks V. S. Lakshmanan,et al.  Data cleaning and query answering with matching dependencies and matching functions , 2011, ICDT '11.

[168]  Mouzhi Ge,et al.  Effects of information quality on inventory management , 2008, Int. J. Inf. Qual..

[169]  Ray R. Hashemi,et al.  Extraction of Features with Unstructured Representation from HTML Documents , 2002, ICWI.

[170]  Laks V. S. Lakshmanan,et al.  Discovering Conditional Functional Dependencies , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[171]  Leopoldo E. Bertossi,et al.  Characterizing and Computing Semantically Correct Answers from Databases with Annotated Logic and Answer Sets , 2001, Semantics in Databases.

[172]  Lise Getoor,et al.  Iterative record linkage for cleaning and integration , 2004, DMKD '04.

[173]  Michael H. Brackett Data Resource Quality: Turning Bad Habits into Good Practices , 2000 .

[174]  Andrea Calì,et al.  Data integration under integrity constraints , 2004, Inf. Syst..

[175]  Sanjay L. Ahire Management Science—Total Quality Management Interfaces: An Integrative Framework , 1997 .

[176]  Adir Even,et al.  Utility-driven assessment of data quality , 2007, DATB.

[177]  Surajit Chaudhuri,et al.  Maintenance of Materialized Views: Problems, Techniques, and Applications. , 1995 .

[178]  Robert Hillard Information-Driven Business: How to Manage Data and Information for Maximum Advantage , 2010 .

[179]  Raymond J. Mooney,et al.  Bottom-Up Relational Learning of Pattern Matching Rules for Information Extraction , 2003, J. Mach. Learn. Res..

[180]  Shuai Ma,et al.  Interaction between Record Matching and Data Repairing , 2014, JDIQ.

[181]  David Loshin,et al.  The Practitioner's Guide to Data Quality Improvement , 2010 .

[182]  Jean-Marie Nicolas Logic for improving integrity checking in relational data bases⋆ , 2004, Acta Informatica.

[183]  Hector Garcia-Molina Pair-Wise entity resolution: overview and challenges , 2006, CIKM '06.

[184]  Felix Naumann,et al.  Informationsintegration - Architekturen und Methoden zur Integration verteilter und heterogener Datenquellen , 2006 .

[185]  Boris Otto,et al.  Towards a maturity model for corporate data quality management , 2009, SAC '09.

[186]  Jan Chomicki,et al.  Answer sets for consistent query answering in inconsistent databases , 2002, Theory and Practice of Logic Programming.

[187]  D. Holt,et al.  A Systematic Approach to Automatic Edit and Imputation , 1976 .

[188]  Joseph Moses Juran,et al.  Quality-control handbook , 1951 .

[189]  R. Nolan,et al.  Managing the Four Stages of EDP Growth , 1974 .

[190]  D. Campbell Task Complexity: A Review and Analysis , 1988 .

[191]  Cihan Varol,et al.  An Overview of Open Source Data Quality Tools , 2010, IKE.

[192]  Divesh Srivastava,et al.  Discovery of complex glitch patterns: A novel approach to Quantitative Data Cleaning , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[193]  Jos de Bruijn,et al.  Information Integration with Ontologies: Experiences from an Industrial Showcase , 2005 .

[194]  Lilia Maria Vargas,et al.  Research into Information Quality: A Study of the State of the Art in IQ and Its Consolidation , 2006, ICIQ.

[195]  Salvatore J. Stolfo,et al.  Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem , 1998, Data Mining and Knowledge Discovery.

[196]  Shuai Ma,et al.  Improving Data Quality: Consistency and Accuracy , 2007, VLDB.

[197]  Rajeev Goré,et al.  A Logical Formalisation of the Fellegi-Holt Method of Data Cleaning , 2003, IDA.

[198]  Jianzhong Li,et al.  Incremental Detection of Inconsistencies in Distributed Data , 2014, IEEE Trans. Knowl. Data Eng..

[199]  Anuradha Bhamidipaty,et al.  Interactive deduplication using active learning , 2002, KDD.

[200]  Nicholas G. Carr,et al.  Does IT Matter? Information Technology and the Corrosion of Competitive Advantage , 2004 .

[201]  M. Porter,et al.  How Information Gives You Competitive Advantage , 1985 .

[202]  Jianzhong Li,et al.  The VLDB Journal manuscript No. (will be inserted by the editor) Dynamic Constraints for Record Matching , 2022 .

[203]  Thomas Pyzdek,et al.  The Six Sigma Handbook , 2000 .

[204]  Boris Otto,et al.  A Meta-model for Data Quality Management Simulation , 2009, ICIQ.

[205]  Bert van Wegen,et al.  Measuring the economic value of information systems , 1996, J. Inf. Technol..

[206]  Andy Koronios,et al.  Agile Maturity Model Approach to Assessing and Enhancing the Quality of Asset Information in Engineering Asset Management Information Systems , 2006, BIS.

[207]  Thomas Redman,et al.  Data quality for the information age , 1996 .

[208]  Theodore Johnson,et al.  Consistency in a Stream Warehouse , 2011, CIDR.

[209]  Abhinav Gupta,et al.  Optimizing Refresh of a Set of Materialized Views , 2005, VLDB.

[210]  Richard O. Sinnott,et al.  Supporting UK-wide e-clinical trials and studies , 2008, Mardi Gras Conference.

[211]  Seung-won Hwang,et al.  Web scale taxonomy cleansing , 2011, Proc. VLDB Endow..

[212]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[213]  Satya S. Sahoo,et al.  A Survey of Current Approaches for Mapping of Relational Databases to RDF , 2009 .

[214]  Robert W. Zmud,et al.  Arrangements for Information Technology Governance: A Theory of Multiple Contingencies , 1999, MIS Q..

[215]  Calyampudi Radhakrishna Rao,et al.  Linear Statistical Inference and its Applications , 1967 .

[216]  Jean-Marc Petit,et al.  Unary and n-ary inclusion dependency discovery in relational databases , 2009, Journal of Intelligent Information Systems.

[217]  Dmitri V. Kalashnikov,et al.  Exploiting Relationships for Domain-Independent Data Cleaning , 2005, SDM.

[218]  Divesh Srivastava,et al.  Discovering Conservation Rules , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[219]  Christian Bizer,et al.  Quality-driven information filtering using the WIQA policy framework , 2009, J. Web Semant..

[220]  Ana Lucas Corporate data quality management in context , 2010, ICIQ.

[221]  Graeme Hirst,et al.  Evaluating WordNet-based Measures of Lexical Semantic Relatedness , 2006, CL.

[222]  Boris Otto,et al.  Dealing with Complexity: A Method to Adapt and Implement a Maturity Model for Corporate Data Quality Management , 2009, AMCIS.

[223]  Hector Garcia-Molina,et al.  Generic entity resolution with negative rules , 2009, The VLDB Journal.

[224]  Jan Chomicki,et al.  Consistent query answers in inconsistent databases , 1999, PODS '99.

[225]  Mieczyslaw M. Kokar,et al.  Use Cases for Ontologies in Information Fusion , 2006 .

[226]  John R. Talburt,et al.  CoDoSA: A Lightweight, XML-Based Framework for Integrating Unstructured Textual Information , 2009, AMCIS.

[227]  Diane M. Strong,et al.  Information quality benchmarks: product and service performance , 2002, CACM.

[228]  Adir Even,et al.  Evaluating a model for cost-effective data quality management in a real-world CRM setting , 2010, Decis. Support Syst..

[229]  Celia Garcia Gomez,et al.  Windows Xp Professional , 2006 .

[230]  Rik Maes,et al.  Information governance: in search of the forgotten grail , 2009 .

[231]  Nathalie Pernelle,et al.  L2R: A Logical Method for Reference Reconciliation , 2007, AAAI.

[232]  Yinle Zhou,et al.  Entity identity information management (EIIM) , 2011, ICIQ.

[233]  Cmmi Product Team CMMI for Development, Version 1.2 , 2010 .

[234]  Leopoldo E. Bertossi,et al.  Consistent Query Answers in Virtual Data Integration Systems , 2005, Inconsistency Tolerance.

[235]  Hector Garcia-Molina,et al.  Applying update streams in a soft real-time database system , 1995, SIGMOD '95.

[236]  Rajeev Rastogi,et al.  A cost-based model and effective heuristic for repairing constraints by value modification , 2005, SIGMOD '05.

[237]  Jianzhong Li,et al.  Reasoning about Record Matching Rules , 2009, Proc. VLDB Endow..

[238]  Mouzhi Ge,et al.  An Information Oriented Framework for Relating IS/IT Resources and Business Value , 2011, ICEIS.

[239]  Thomas C. Redman,et al.  Measuring Data Accuracy: A Framework and Review , 2014 .

[240]  Martha Rogers,et al.  Customer Data Integration: Reaching a Single Version of the Truth (SAS Institute Inc.) , 2006 .

[241]  Divesh Srivastava,et al.  Efficient and Effective Analysis of Data Quality using Pattern Tableaux , 2011, IEEE Data Eng. Bull..

[242]  Dmitri V. Kalashnikov,et al.  Exploiting relationships for object consolidation , 2005, IQIS '05.

[243]  Diane M. Strong,et al.  Data quality in context , 1997, CACM.

[244]  Iluju Kiringa,et al.  Matching dependencies: semantics and query answering , 2012, Frontiers of Computer Science.

[245]  G. Shankaranarayan,et al.  Managing Data Quality in Dynamic Decision Environments: An Information Product Approach , 2003, J. Database Manag..

[246]  Vijay Gurbaxani,et al.  A process oriented framework for assessing the business value of information technology , 1995, DATB.

[247]  Sergio Greco,et al.  A Logical Framework for Querying and Repairing Inconsistent Databases , 2003, IEEE Trans. Knowl. Data Eng..

[248]  Veda C. Storey,et al.  A Framework for Analysis of Data Quality Research , 1995, IEEE Trans. Knowl. Data Eng..

[249]  Wenfei Fan,et al.  Dependencies revisited for improving data quality , 2008, PODS.

[250]  Stéphane Bressan,et al.  Ricochet: A Family of Unconstrained Algorithms for Graph Clustering , 2009, DASFAA.

[251]  Pradeep Ravikumar,et al.  Adaptive Name Matching in Information Integration , 2003, IEEE Intell. Syst..

[252]  Georg Gottlob,et al.  Complexity and expressive power of logic programming , 2001, CSUR.

[253]  InduShobha N. Chengalur-Smith,et al.  The Impact of Data Quality Information on Decision Making: An Exploratory Analysis , 1999, IEEE Trans. Knowl. Data Eng..

[254]  A C C Gibbs,et al.  Data Analysis , 2009, Encyclopedia of Database Systems.

[255]  Phokion G. Kolaitis,et al.  On the Data Complexity of Consistent Query Answering , 2012, ICDT '12.

[256]  David Loshin,et al.  5 – Dimensions of data quality , 2001 .

[257]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[258]  Vic Barnett,et al.  Outliers in Statistical Data , 1980 .

[259]  Philip Woodall,et al.  Towards a process for total information risk management , 2011, ICIQ.

[260]  Reda Alhajj,et al.  Data governance strategy: a key issue in building Enterprise Data Warehouse , 2009, iiWAS.

[261]  Srinivasan Raghunathan,et al.  Impact of information quality and decision-maker quality on decision quality: a theoretical model and simulation analysis , 1999, Decis. Support Syst..

[262]  Lise Getoor,et al.  Entity resolution in geospatial data integration , 2006, GIS '06.

[263]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[264]  Shuai Ma,et al.  Increasing the Expressivity of Conditional Functional Dependencies without Extra Complexity , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[265]  Felix Naumann,et al.  Data fusion , 2009, CSUR.

[266]  Carlo Zaniolo,et al.  Non-Determinism in Deductive Databases , 1991, DOOD.

[267]  Boris Otto,et al.  A morphology of the organisation of data governance , 2011, ECIS.

[268]  Marc Delbaere,et al.  Addressing the data aspects of compliance with industry models , 2007, IBM Syst. J..

[269]  Martin Hepp,et al.  Using Semantic Web Resources for Data Quality Management , 2010, EKAW.

[270]  Jack E. Olson,et al.  Data Quality: The Accuracy Dimension , 2003 .

[271]  Hector Garcia-Molina,et al.  Synchronizing a database to improve freshness , 2000, SIGMOD 2000.

[272]  Catherine Quantin,et al.  How to ensure data security of an epidemiological follow-up: quality assessment of an anonymous record linkage procedure , 1998, Int. J. Medical Informatics.

[273]  Shuai Ma,et al.  Extending Dependencies with Conditions , 2007, VLDB.

[274]  D. Teece,et al.  DYNAMIC CAPABILITIES AND STRATEGIC MANAGEMENT , 1997 .

[275]  Leopoldo E. Bertossi,et al.  Tractable Cases of Clean Query Answering under Entity Resolution via Matching Dependencies , 2012, SUM.

[276]  Julio Gonzalo,et al.  The SemEval-2007 WePS Evaluation: Establishing a benchmark for the Web People Search Task , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[277]  Carlo Batini,et al.  Methodologies for data quality assessment and improvement , 2009, CSUR.

[278]  Anna Sidorova,et al.  Factors influencing business intelligence (BI) data collection strategies: An empirical investigation , 2012, Decis. Support Syst..

[279]  Michael Gertz,et al.  Semantic integrity support in SQL:1999 and commercial (object-)relational database management systems , 2001, The VLDB Journal.

[280]  Samson Abramsky,et al.  Domain theory , 1995, LICS 1995.

[281]  Mouzhi Ge,et al.  A Review of Information Quality Research - Develop a Research Agenda , 2007, ICIQ.

[282]  Stephen Soderland,et al.  Learning to Extract Text-Based Information from the World Wide Web , 1997, KDD.

[283]  E. F. Codd,et al.  A relational model of data for large shared data banks , 1970, CACM.

[284]  Richard M. Schwartz,et al.  An Algorithm that Learns What's in a Name , 1999, Machine Learning.

[285]  Bei Yu,et al.  On generating near-optimal tableaux for conditional functional dependencies , 2008, Proc. VLDB Endow..

[286]  Lai Kuan Cheong,et al.  The Need for Data Governance: A Case Study , 2007 .

[287]  Craig A. Knoblock,et al.  Learning Blocking Schemes for Record Linkage , 2006, AAAI.

[288]  Dimitrios Skoutas,et al.  Ontology-Based Conceptual Design of ETL Processes for Both Structured and Semi-Structured Data , 2007, Int. J. Semantic Web Inf. Syst..

[289]  Stefano Paraboschi,et al.  An XACML-based privacy-centered access control system , 2009, WISG '09.

[290]  Alan Gillies,et al.  An international comparison of information in adverse events. , 2005, International journal of health care quality assurance incorporating Leadership in health services.

[291]  Arie Segev Data Quality Challenges in Enabling eBusiness Transformation , 2001, IQ.

[292]  Gustavo Alonso,et al.  A Pipelined Framework for Online Cleaning of Sensor Data Streams , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[293]  David Chidester,et al.  In His Own Words , 2003 .

[294]  Theodore Johnson,et al.  Exploratory Data Mining and Data Cleaning , 2003 .

[295]  Marta Indulska,et al.  20 Years of Data Quality Research: Themes, Trends and Synergies , 2011, ADC.

[296]  InduShobha N. Chengalur-Smith,et al.  The Impact of Experience and Time on the Use of Data Quality Information in Decision Making , 2003, Inf. Syst. Res..

[297]  H. Newcombe Record linking: the design of efficient systems for linking records into individual and family histories. , 1967, American journal of human genetics.

[298]  B. Wernerfelt,et al.  A Resource-Based View of the Firm , 1984 .

[299]  Martin J. Eppler Managing Information Quality , 2003 .

[300]  Siani Pearson,et al.  Taking account of privacy when designing cloud computing services , 2009, 2009 ICSE Workshop on Software Engineering Challenges of Cloud Computing.

[301]  Renée J. Miller,et al.  Framework for Evaluating Clustering Algorithms in Duplicate Detection , 2009, Proc. VLDB Endow..

[302]  R. Yin Case Study Research: Design and Methods , 1984 .

[303]  Christian Bizer,et al.  D2R Server - Publishing Relational Databases on the Semantic Web , 2004 .

[304]  Karel Cool,et al.  Asset stock accumulation and sustainability of competitive advantage , 1989 .

[305]  Eric Flisser,et al.  One at a time. , 2006, Fertility and sterility.

[306]  Ana Carolina Salgado,et al.  Towards a Context Ontology to Enhance Data Integration Processes , 2008, ODBIS.

[307]  John van den Hoven Information Resource Management: Stewards of Data , 1999, Inf. Syst. Manag..

[308]  P. Mouncey Improving Data Warehouse and Business Information Quality , 2001 .

[309]  Philip B. Crosby,et al.  Quality Is Free: The Art of Making Quality Certain , 1979 .

[310]  Carrie Gates,et al.  The security and privacy implications of using social networks to deliver healthcare , 2010, PETRA '10.

[311]  Diane M. Strong,et al.  AIMQ: a methodology for information quality assessment , 2002, Inf. Manag..

[312]  Jayant Madhavan,et al.  Reference reconciliation in complex information spaces , 2005, SIGMOD '05.

[313]  Sandip Debnath,et al.  Learning metadata from the evidence in an on-line citation matching scheme , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[314]  Mike Gregory,et al.  The use of maturity models/grids as a tool in assessing product development capability , 2002, IEEE International Engineering Management Conference.

[315]  Armand V. Feigenbaum,et al.  Total quality control , 1961 .

[316]  Erhard Rahm,et al.  Data Cleaning: Problems and Current Approaches , 2000, IEEE Data Eng. Bull..

[317]  Hiroshi Nakagawa,et al.  Person name disambiguation by bootstrapping , 2010, SIGIR.

[318]  Tamraparni Dasu,et al.  Statistical Distortion: Consequences of Data Cleaning , 2012, Proc. VLDB Endow..

[319]  B. J. Tepping A Model for Optimum Linkage of Records , 1968 .

[320]  Andreas Harth,et al.  Weaving the Pedantic Web , 2010, LDOW.

[321]  Hector Garcia-Molina,et al.  Shrinking the warehouse update Window , 1999, SIGMOD '99.

[322]  Yvette Salaün,et al.  Information quality: meeting the needs of the consumer , 2001, Int. J. Inf. Manag..

[323]  Christopher Ré,et al.  Large-Scale Deduplication with Constraints Using Dedupalog , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[324]  N. Carr IT doesn't matter , 2003, IEEE Engineering Management Review.

[325]  G. Kaebnick Care and feeding. , 2014, The Hastings Center report.

[326]  John M. Ward,et al.  Organizational information systems competences in small and medium-sized enterprises , 2011, Inf. Manag..

[327]  Richard Y. Wang,et al.  Data quality assessment , 2002, CACM.

[328]  Sergio Greco,et al.  Active Integrity Constraints for Database Consistency Maintenance , 2009, IEEE Transactions on Knowledge and Data Engineering.

[329]  Andrea Calì,et al.  Datalog+/-: A Family of Logical Knowledge Representation and Query Languages for New Applications , 2010, 2010 25th Annual IEEE Symposium on Logic in Computer Science.

[330]  Won Kim,et al.  Towards Quantifying Data Quality Costs , 2003, J. Object Technol..