A review of data quality research in achieving high data quality within organization

The aim of this review is to highlight issues in data quality research and to discuss potential research opportunity to achieve high data quality within an organization. The review adopted systematic literature review method based on research articles published in journals and conference proceedings. We developed a review strategy based on specific themes such as current research area in data quality, critical dimensions in data quality, data quality management model and methodologies and data quality assessment methods. Based on the review strategy, we select relevant research articles, extract and synthesis the information to answer our research questions. The review highlights the advancement of data quality research to resemble its real world application and discuss the available gap for future research. Research area such as organizations management, data quality impact towards the organization and database related technical solutions for data quality dominated the early years of data quality research. However, since the Internet is now taking place as the new information source, the emerging of new research areas such as data quality assessment for web and big data is inevitable. This review also identifies and discusses critical data quality dimensions in organization such as data completeness, consistency, accuracy and timeliness. We also compare and highlight gaps in data quality management model and methodologies. Existing model and methodologies capabilities are restricted to the structured data type and limit its ability to assess data quality in web and big data. Finally, we uncover available methods in data quality assessment and highlight its limitation for future research. This review is important to highlight and analyse limitation of existing data quality research related to the recent needs in data quality such as unstructured data type and big data.

[1]  Diane M. Strong,et al.  AIMQ: a methodology for information quality assessment , 2002, Inf. Manag..

[2]  Richard Y. Wang,et al.  Anchoring data quality dimensions in ontological foundations , 1996, CACM.

[3]  Elizabeth M. Pierce Assessing data quality with control matrices , 2004, CACM.

[4]  Richard Y. Wang,et al.  A product perspective on total data quality management , 1998, CACM.

[5]  Richard Y. Wang,et al.  Toward quality data: An attribute-based approach , 2014, Decis. Support Syst..

[6]  Pável Calado,et al.  Automatic Assessment of Document Quality in Web Collaborative Digital Libraries , 2011, JDIQ.

[7]  Jie Li,et al.  Rethinking big data: A review on the data quality and usage issues , 2016 .

[8]  Marta Indulska,et al.  Open data: Quality over quantity , 2017, Int. J. Inf. Manag..

[9]  Pearl Brereton,et al.  Systematic literature reviews in software engineering - A systematic literature review , 2009, Inf. Softw. Technol..

[10]  Stuart E. Madnick,et al.  Overview and Framework for Data and Information Quality Research , 2009, JDIQ.

[11]  Stuart E. Madnick,et al.  Improving data quality through effective use of data semantics , 2006, Data Knowl. Eng..

[12]  Richard Y. Wang,et al.  Data Quality , 2000, Advances in Database Systems.

[13]  Mouzhi Ge,et al.  Big Data Quality - Towards an Explanation Model , 2016, MIT International Conference on Information Quality.

[14]  Diane M. Strong,et al.  Data quality in context , 1997, CACM.

[15]  Veda C. Storey,et al.  A Framework for Analysis of Data Quality Research , 1995, IEEE Trans. Knowl. Data Eng..

[16]  Diane M. Strong,et al.  10 Potholes in the Road to Information Quality , 1997, Computer.

[17]  Carlo Batini,et al.  Methodologies for data quality assessment and improvement , 2009, CSUR.

[18]  Donald P. Ballou,et al.  Modeling Data and Process Quality in Multi-Input, Multi-Output Information Systems , 1985 .

[19]  Anany Levitin,et al.  Data as a Resource: Properties, Implications, and Prescriptions , 1998 .

[20]  Adriana Marotta,et al.  Data Quality Management in Web Warehouses using BPM , 2016, ICIQ.

[21]  Andy Koronios,et al.  An Investigation of How Data Quality is Affected by Dataset Size in the Context of Big Data Analytics , 2014, ICIQ.

[22]  Marta Indulska,et al.  An analysis of data quality dimensions , 2015 .

[23]  Yu Xiao,et al.  Knowledge diffusion path analysis of data quality literature: A main path analysis , 2014, J. Informetrics.

[24]  Diane M. Strong,et al.  Beyond Accuracy: What Data Quality Means to Data Consumers , 1996, J. Manag. Inf. Syst..

[25]  Andree E. Widjaja,et al.  Facebook C2C social commerce: A study of online impulse buying , 2016, Decis. Support Syst..

[26]  S. Brintha Rajakumari Data Quality Mining in Electronic News Paper , 2014 .

[27]  Diane M. Strong,et al.  Knowing-Why About Data Processes and Data Quality , 2004 .

[28]  Diane M. Strong,et al.  Information quality benchmarks: product and service performance , 2002, CACM.

[29]  Lilly Suriani Affendey,et al.  A Framework to Construct Data Quality Dimensions Relationships , 2013 .

[30]  Marta Indulska,et al.  20 Years of Data Quality Research: Themes, Trends and Synergies , 2011, ADC.

[31]  Byeong-Hee Lee,et al.  A Study on the Problem Analysis and Improvement Plan of the Data Quality Management System of National R&D Data , 2015 .

[32]  R. P. Srivastava,et al.  A conceptual framework and belief‐function approach to assessing overall information quality , 2003, Int. J. Intell. Syst..

[33]  Jae Hong Park,et al.  A Data Quality Management Maturity Model , 2006 .

[34]  Hamidah Ibrahim,et al.  Data quality: A survey of data quality dimensions , 2012, 2012 International Conference on Information Retrieval & Knowledge Management.

[35]  Stuart E. Madnick,et al.  Data quality requirements analysis and modeling , 2011, Proceedings of IEEE 9th International Conference on Data Engineering.

[36]  Richard Y. Wang,et al.  Data quality assessment , 2002, CACM.