Methodologies for data quality assessment and improvement

The literature provides a wide range of techniques to assess and improve the quality of data. Due to the diversity and complexity of these techniques, research has recently focused on defining methodologies that help the selection, customization, and application of data quality assessment and improvement techniques. The goal of this article is to provide a systematic and comparative description of such methodologies. Methodologies are compared along several dimensions, including the methodological phases and steps, the strategies and techniques, the data quality dimensions, the types of data, and, finally, the types of information systems addressed by each methodology. The article concludes with a summary description of each methodology.

[1]  FrancalanciChiara,et al.  Methodologies for data quality assessment and improvement , 2009 .

[2]  Shazia Wasim Sadiq,et al.  Data Quality in Web Information Systems , 2008, WISE.

[3]  Carlo Batini,et al.  A Comprehensive Data Quality Methodology for Web and Structured Data , 2007, 2006 1st International Conference on Digital Information Management.

[4]  Craig W. Fisher,et al.  Criticality of data quality as exemplified in two disasters , 2001, Inf. Manag..

[5]  Valeria De Antonellis,et al.  Relational Database Theory , 1993 .

[6]  Luca De Santis,et al.  Automatic Record Matching in Cooperative Information Systems , 2002 .

[7]  R. P. Srivastava,et al.  A conceptual framework and belief‐function approach to assessing overall information quality , 2003, Int. J. Intell. Syst..

[8]  Thomas Redman,et al.  The impact of poor data quality on the typical enterprise , 1998, CACM.

[9]  C RedmanThomas The impact of poor data quality on the typical enterprise , 1998 .

[10]  Chiara Francalanci,et al.  Data Quality Assurance in Cooperative Information Systems: A Multi-Dimension Quality Certificate , 2003 .

[11]  Valter Crescenzi,et al.  The (Short) Araneus Guide to Web-Site Development , 1999, WebDB.

[12]  Andrea Calì,et al.  Data integration under integrity constraints , 2004, Inf. Syst..

[13]  Nicolás Marín,et al.  Review of Data on the Web: from relational to semistructured data and XML by Serge Abiteboul, Peter Buneman, and Dan Suciu. Morgan Kaufmann 1999. , 2003, SGMD.

[14]  Peter Mykytyn,et al.  Information Technology Investment and Firm Performance: A Perspective of Data Quality , 2002, ICIQ.

[15]  Peter Buneman,et al.  Semistructured data , 1997, PODS.

[16]  Paolo Merialdo,et al.  Web Site Evaluation: Methodology and Case Study , 2001, ER.

[17]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[18]  Donald P. Ballou,et al.  Modeling Data and Process Quality in Multi-Input, Multi-Output Information Systems , 1985 .

[19]  M. Hammer,et al.  Reengineering the Corporation , 1993 .

[20]  Amihai Motro,et al.  Fusionplex: resolution of data inconsistencies in the integration of heterogeneous information sources , 2006, Inf. Fusion.

[21]  Larry P. English Improving Data Warehouse and Business Information Quality: Methods for Reducing Costs and Increasing Profits , 1999 .

[22]  Bas H. P. J. Vermeer,et al.  How important is data quality for evaluating the impact of EDI on global supply chains? , 2000, Proceedings of the 33rd Annual Hawaii International Conference on System Sciences.

[23]  M. Jarke,et al.  Fundamentals of Data Warehouses , 2003, Springer Berlin Heidelberg.

[24]  ConstraintsAndrea,et al.  Data Integration under Integrity , 2002 .

[25]  Diane M. Strong,et al.  Beyond Accuracy: What Data Quality Means to Data Consumers , 1996, J. Manag. Inf. Syst..

[26]  Richard Y. Wang,et al.  Anchoring data quality dimensions in ontological foundations , 1996, CACM.

[27]  Andreas Thor,et al.  iFuice - Information Fusion utilizing Instance Correspondences and Peer Mappings , 2005, WebDB.

[28]  Carlo Batini,et al.  Data Quality: Concepts, Methodologies and Techniques , 2006, Data-Centric Systems and Applications.

[29]  Ying Su,et al.  A Methodology For Information Quality Assessment In The Designing And Manufacturing Processes Of Mechanical Products , 2004, ICIQ.

[30]  Tiziana Catarci,et al.  A Peer-to-Peer Service Supporting Data Quality: Design and Implementation Issues , 2004, ICSNW.

[31]  Michael Hammer,et al.  Reengineering Work: Don’t Automate, Obliterate , 1990 .

[32]  Barbara Pernici,et al.  Data Quality in Web Information Systems , 2003, J. Data Semant..

[33]  Paolo Merialdo,et al.  The Araneus Web-based management system , 1998, SIGMOD '98.

[34]  Salvatore J. Stolfo,et al.  Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem , 1998, Data Mining and Knowledge Discovery.

[35]  R. Rao From unstructured data to actionable intelligence , 2003 .

[36]  Antonino Virgillito Carlo Marchetti,et al.  The DaQuinCIS Architecture : a Platform for Exchanging and Improving Data Quality in Cooperative Information Systems ? , 2003 .

[37]  Andy Koronios,et al.  Towards a Capability Maturity Model for Information Quality Management: A TDQM Approach , 2006, ICIQ.

[38]  Matthias Jarke,et al.  Cooperative Information Systems: A Manifesto * , 1997 .

[39]  Pravin Nadkarni Delivering Data On Time: The Assurant Health Case , 2006, ICIQ.

[40]  Matthias Jarke,et al.  Design and Analysis of Quality Information for Data Warehouses , 1998, ER.

[41]  Rita Kovac,et al.  Starting with Quality: Using TDQM in a Start-Up Organization , 2002, ICIQ.

[42]  Pier Luca Lanzi,et al.  Model-Driven Web Usage Analysis for the Evaluation of Web Application Quality , 2004, J. Web Eng..

[43]  Tiziana Catarci,et al.  Trusting Data Quality in Cooperative Information Systems , 2002, OTM.

[44]  共立出版株式会社 コンピュータ・サイエンス : ACM computing surveys , 1978 .

[45]  Felix Naumann,et al.  Automatic Data Fusion with HumMer , 2005, VLDB.

[46]  Dan Suciu,et al.  Data on the Web: From Relations to Semistructured Data and XML , 1999 .

[47]  Barbara Pernici,et al.  IP-UML: Towards a Methodology for Quality Improvement Based on the IP-MAP Framework , 2002, ICIQ.

[48]  Chiara Francalanci,et al.  A comprehensive data quality methodology for web and structured data , 2008 .

[49]  Yihua Philip Sheng Exploring the Mediating and Moderating Effects of Information Quality on Firms? Endeavor on Information Systems , 2003, ICIQ.

[50]  Matthias Jarke,et al.  Fundamentals of Data Warehouses , 2000, Springer Berlin Heidelberg.

[51]  Larry P. English Process Management and Information Quality: How Improving Information Production Processes Improves Information (Product) Quality , 2002, ICIQ.

[52]  A. Karr Exploratory Data Mining and Data Cleaning , 2006 .

[53]  David J. Corey,et al.  Data Quality Assurance Activities in the Military Health Services System , 1996, IQ.

[54]  Richard Y. Wang,et al.  Data quality assessment , 2002, CACM.

[55]  Fabio Vitali,et al.  Web Information Systems - Introduction. , 1998 .

[56]  Carlo Batini,et al.  Data Quality: Concepts, Methodologies and Techniques (Data-Centric Systems and Applications) , 2006 .

[57]  Namchul Shin,et al.  An Investigation of the Methodologies of Business Process Reengineering , 2004 .

[58]  Sophie Cluet Modeling and Querying Semi-structured Data , 1997, SCIE.

[59]  Martin J. Eppler,et al.  Measuring Information Quality in the Web Context: A Survey of State-of-the-Art Instruments and an Application Methodology , 2002, ICIQ.

[60]  Zbigniew J. Gackowski Redefining Information Quality and its Measuring: The Operations Management Approach , 2006, ICIQ.

[61]  Tony Norris,et al.  The Development of a Healthcare Data Quality Framework and Strategy , 2004, ICIQ.

[62]  Varun Grover,et al.  Special Section: Toward a Theory of Business Process Change Management , 1995, J. Manag. Inf. Syst..

[63]  Serge Abiteboul Semi-Structured Data , 2009, Encyclopedia of Database Systems.

[64]  Patrick Bettschen Master Data Management (MDM) enables IQ at Tetra Pak , 2005, ICIQ.

[65]  Jan Chomicki,et al.  Consistent query answers in inconsistent databases , 1999, PODS '99.

[66]  Timos K. Sellis,et al.  ARKTOS: towards the modeling, design, control and execution of ETL processes , 2001, Inf. Syst..

[67]  Carlo Batini,et al.  An Analytical Framework to Analyze Dependencies Among Data Quality Dimensions , 2006, ICIQ.

[68]  David Loshin Enterprise knowledge management: the data quality approach , 2000 .

[69]  Enid Mumford,et al.  Reengineering the Corporation: A Manifesto for Business Revolution , 1995 .

[70]  Diego Calvanese,et al.  Modeling and Querying Semi-Structured data , 1999, Netw. Inf. Syst. J..

[71]  Tomás Isakowitz,et al.  RMM: a methodology for structured hypermedia design , 1995, CACM.

[72]  Richard Y. Wang,et al.  Data Quality Assessment , 2002 .

[73]  Fabio Vitali,et al.  Web information systems , 1998, CACM.

[74]  Martin J. Eppler,et al.  A Classification and Analysis of Data Quality Costs , 2004 .

[75]  Richard Y. Wang,et al.  A product perspective on total data quality management , 1998, CACM.

[76]  Chiara Francalanci,et al.  Preserving Web Sites: a Data Quality Approach , 2003, ICIQ.

[77]  D. Holt,et al.  A Systematic Approach to Automatic Edit and Imputation , 1976 .

[78]  Ann Chapman,et al.  Data and Information Quality at the Canadian Institute for Health Information , 2006, ICIQ.

[79]  Thomas Redman,et al.  Data quality for the information age , 1996 .

[80]  Felix Naumann,et al.  Quality-Driven Query Answering for Integrated Information Systems , 2002, Lecture Notes in Computer Science.

[81]  Richard Y. Wang,et al.  Modeling Information Manufacturing Systems to Determine Information Product Quality Management Scien , 1998 .

[82]  P. Mouncey Improving Data Warehouse and Business Information Quality , 2001 .

[83]  Diane M. Strong,et al.  AIMQ: a methodology for information quality assessment , 2002, Inf. Manag..

[84]  InduShobha N. Chengalur-Smith,et al.  The Impact of Data Quality Information on Decision Making: An Exploratory Analysis , 1999, IEEE Trans. Knowl. Data Eng..

[85]  Srinivasan Raghunathan,et al.  Impact of information quality and decision-maker quality on decision quality: a theoretical model and simulation analysis , 1999, Decis. Support Syst..