An Overview of Data Quality Frameworks

Nowadays, the importance of achieving and maintaining a high standard of data quality is widely recognized by both practitioners and researchers. Based on its impact on businesses, the quality of data is commonly viewed as a valuable asset. The literature comprises various techniques for defining, assessing, and improving data quality. However, requirements for data and their quality vary between organizations. Due to this variety, choosing suitable methods that are advantageous for the data quality of an organization or in a particular context can be challenging. This paper surveys data quality frameworks in a comparative way regarding the definition, assessment, and improvement of data quality with a focus on methodologies that are applicable in a wide range of business environments. To aid the decision process concerning the suitability of these methods, we further provide a decision guide to data quality frameworks. This guidance aims to help narrow down possible choices for data quality methodologies based on a number of specified criteria.

[1]  Hamidah Ibrahim,et al.  Data quality comparative model for data warehouse , 2012, 2012 International Conference on Information Retrieval & Knowledge Management.

[2]  Jafar Habibi,et al.  TBDQ: A Pragmatic Task-Based Method to Data Quality Assessment and Improvement , 2016, PloS one.

[3]  David Loshin Enterprise knowledge management: the data quality approach , 2000 .

[4]  Diane M. Strong,et al.  AIMQ: a methodology for information quality assessment , 2002, Inf. Manag..

[5]  Angélica Caro,et al.  Designing Business Processes Able to Satisfy Data Quality Requirements , 2012, ICIQ.

[6]  J. Boyd,et al.  A Discourse on Winning and Losing , 1987 .

[7]  Elizabeth M. Pierce Assessing data quality with control matrices , 2004, CACM.

[8]  Martin J. Eppler,et al.  Measuring Information Quality in the Web Context: A Survey of State-of-the-Art Instruments and an Application Methodology , 2002, ICIQ.

[9]  Mario Piattini,et al.  A Portal Data Quality Model For Users And Developers , 2007, ICIQ.

[10]  Suraj Juddoo,et al.  Overview of data quality challenges in the context of Big Data , 2015, 2015 International Conference on Computing, Communication and Security (ICCCS).

[11]  E. Quah,et al.  Cost Benefit Analysis Ed. 5 , 2007 .

[12]  Ping Yu,et al.  A Review of Data Quality Assessment Methods for Public Health Information Systems , 2014, International journal of environmental research and public health.

[13]  Carlo Batini,et al.  A Comprehensive Data Quality Methodology for Web and Structured Data , 2007, 2006 1st International Conference on Digital Information Management.

[14]  Danette McGilvray,et al.  Executing Data Quality Projects: Ten Steps to Quality Data and Trusted Information TM , 2008 .

[15]  Anders Haug,et al.  The costs of poor data quality , 2011 .

[16]  Diane M. Strong,et al.  Information quality benchmarks: product and service performance , 2002, CACM.

[17]  Lachlan M. MacKinnon,et al.  Detection and Resolution of Data Inconsistencies, and Data Integration using Data Quality Criteria , 2004, QUATIC.

[18]  Graeme G. Shanks,et al.  Developing a Measurement Instrument for Subjective Aspects of Information Quality , 2008, Commun. Assoc. Inf. Syst..

[19]  Yu Xiao,et al.  Knowledge diffusion path analysis of data quality literature: A main path analysis , 2014, J. Informetrics.

[20]  Stuart E. Madnick,et al.  A Cyclic-Hierarchical Method for Database Data-Quality Evaluation and Improvement , 2014 .

[21]  Jennifer Widom,et al.  Tracing the lineage of view data in a warehousing environment , 2000, TODS.

[22]  Markus Helfert,et al.  Proactive data quality management for data warehouse systems , 2002, DMDW.

[23]  Benjamin T. Hazen,et al.  Applying Control Chart Methods to Enhance Data Quality , 2014, Technometrics.

[24]  Giri Kumar Tayi,et al.  Methodology for allocating resources for data quality enhancement , 1989, Commun. ACM.

[25]  Philip Woodall,et al.  Data quality assessment: The Hybrid Approach , 2013, Inf. Manag..

[26]  Barbara Pernici,et al.  HIQM: A Methodology for Information Quality Monitoring, Measurement, and Improvement , 2006, ER.

[27]  Mario Piattini,et al.  A Data Quality in Use model for Big Data , 2016, Future Gener. Comput. Syst..

[28]  Andrew McCallum,et al.  Information Extraction , 2005, ACM Queue.

[29]  Donald P. Ballou,et al.  Modeling Data and Process Quality in Multi-Input, Multi-Output Information Systems , 1985 .

[30]  Jayant Kalagnanam,et al.  Data Quality Management using Business Process Modeling , 2006, 2006 IEEE International Conference on Services Computing (SCC'06).

[31]  Diego Calvanese,et al.  Modeling and Querying Semi-Structured data , 1999, Netw. Inf. Syst. J..

[32]  Paolo Papotti,et al.  Big data quality - whose problem is it? , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[33]  Elizabeth M. Pierce,et al.  Assessing Information Quality Through The Use Of Prediction Markets , 2007, ICIQ.

[34]  David Loshin Business Impacts of Poor Data Quality , 2011 .

[35]  Richard Y. Wang,et al.  Modeling Information Manufacturing Systems to Determine Information Product Quality Management Scien , 1998 .

[36]  Hamidah Ibrahim,et al.  Data quality: A survey of data quality dimensions , 2012, 2012 International Conference on Information Retrieval & Knowledge Management.

[37]  Volker Schwieger,et al.  Modeling Data Quality Using Artificial Neural Networks , 2015 .

[38]  Felix Naumann,et al.  XStruct: Efficient Schema Extraction from Multiple and Large XML Documents , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[39]  David J. Corey,et al.  Data Quality Assurance Activities in the Military Health Services System , 1996, IQ.

[40]  Carlo Batini,et al.  From Data Quality to Big Data Quality , 2015, J. Database Manag..

[41]  In Lee,et al.  Big data: Dimensions, evolution, impacts, and challenges , 2017 .

[42]  Martin Oberhofer,et al.  A classification of data quality assessment methods , 2011, ICIQ.

[43]  Mario Piattini,et al.  A case study on assessing the organizational maturity of data management, data quality management and data governance by means of MAMD , 2016, ICIQ.

[44]  Mouzhi Ge,et al.  Information quality assessment: validating measurement dimensions and processes , 2011, ECIS.

[45]  R. Kaplan,et al.  Time-driven activity-based costing. , 2003, Harvard business review.

[46]  Lachlan Mackinnon,et al.  Quality Measurement and Assessment Models including Data Provenance to grade Data sources , 2005 .

[47]  Roman Lukyanenko Information Quality Research Challenge , 2016, ACM J. Data Inf. Qual..

[48]  Carlo Batini,et al.  A Data Quality Methodology for Heterogeneous Data , 2011 .

[49]  Felix Naumann,et al.  Completeness of integrated information sources , 2004, Inf. Syst..

[50]  David L. Banks,et al.  Data quality: A statistical perspective , 2006 .

[51]  Peter Buneman,et al.  Semistructured data , 1997, PODS.

[52]  Mario Piattini,et al.  MMPRO: A Methodology Based on ISO/IEC 15939 to Draw Up Data Quality Measurement Processes , 2008, ICIQ.

[53]  Carlo Batini,et al.  A Framework And A Methodology For Data Quality Assessment And Monitoring , 2007, ICIQ.

[54]  Carlo Batini,et al.  Methodologies for data quality assessment and improvement , 2009, CSUR.

[55]  Mohamed Adel Serhani,et al.  Big Data Quality: A Quality Dimensions Evaluation , 2016, 2016 Intl IEEE Conferences on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld).

[56]  Jamshid A. Vayghan,et al.  The internal information transformation of IBM , 2007, IBM Syst. J..

[57]  Giri Kumar Tayi,et al.  Examining data quality , 1998, CACM.

[58]  Diane M. Strong,et al.  Beyond Accuracy: What Data Quality Means to Data Consumers , 1996, J. Manag. Inf. Syst..

[59]  Henryk Krawczyk,et al.  Visual GQM approach to quality-driven development of electronic documents , 2003 .

[60]  Lilly Suriani Affendey,et al.  The impact of data quality dimensions on business process improvement , 2014, 2014 4th World Congress on Information and Communication Technologies (WICT 2014).

[61]  T. Saaty,et al.  The Analytic Hierarchy Process , 1985 .

[62]  Francisco J. García-Ugalde,et al.  Assessing Quality of Derived Non Atomic Data by Considering Conflict Resolution Function , 2009, 2009 First International Confernce on Advances in Databases, Knowledge, and Data Applications.

[63]  Martin J. Eppler,et al.  A Classification and Analysis of Data Quality Costs , 2004 .

[64]  Glen D. Murphy,et al.  Improving the quality of manually acquired data: Applying the theory of planned behaviour to data quality , 2009, Reliab. Eng. Syst. Saf..

[65]  Larry A. Pace,et al.  Preventing human error: The impact of data entry methods on data accuracy and statistical results , 2011, Comput. Hum. Behav..

[66]  Richard Y. Wang,et al.  A product perspective on total data quality management , 1998, CACM.

[67]  Yangyong Zhu,et al.  The Challenges of Data Quality and Data Quality Assessment in the Big Data Era , 2015, Data Sci. J..

[68]  Ying Su,et al.  A Methodology For Information Quality Assessment In The Designing And Manufacturing Processes Of Mechanical Products , 2004, ICIQ.

[69]  Divesh Srivastava,et al.  Data quality: The other face of Big Data , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[70]  Adir Even,et al.  Understanding Impartial Versus Utility-Driven Quality Assessment In Large Datasets , 2007, ICIQ.

[71]  E. Quah,et al.  Cost-Benefit Analysis , 1972 .

[72]  Amihai Motro,et al.  Estimating the Quality of Databases , 1998, FQAS.

[73]  Jorge Bernardino,et al.  A Survey on Data Quality: Classifying Poor Data , 2015, 2015 IEEE 21st Pacific Rim International Symposium on Dependable Computing (PRDC).

[74]  S. Sukumar,et al.  Quality of Big Data in health care. , 2015, International journal of health care quality assurance.

[75]  Antonino Virgillito Carlo Marchetti,et al.  The DaQuinCIS Architecture : a Platform for Exchanging and Improving Data Quality in Cooperative Information Systems ? , 2003 .

[76]  Diane M. Strong,et al.  Product and Service Performance Model for Information Quality: An Update , 1998, IQ.

[77]  Matthias Jarke,et al.  Design and Analysis of Quality Information for Data Warehouses , 1998, ER.

[78]  Maydanchik Arkady,et al.  Data Quality Assessment , 2008 .

[79]  Laura Sebastian-Coleman Introduction: Measuring Data Quality for Ongoing Improvement , 2013 .

[80]  Felician Campean,et al.  Towards a Data Quality Framework for Heterogeneous Data , 2017, 2017 IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData).

[81]  Marcus Kaiser,et al.  Metrics for Measuring Data Quality - Foundations for an Economic Oriented Management of Data Quality , 2007 .

[82]  Richard Y. Wang,et al.  Anchoring data quality dimensions in ontological foundations , 1996, CACM.

[83]  Carlo Batini,et al.  Data Quality: Concepts, Methodologies and Techniques , 2006, Data-Centric Systems and Applications.

[84]  J. A. Vayghan,et al.  The internal information of IBM , 2007 .

[85]  Monique Snoeck,et al.  A theoretical framework to improve the quality of manually acquired data , 2019, Inf. Manag..