Big Data Validation and Quality Assurance -- Issuses, Challenges, and Needs

With the fast advance of big data technology and analytics solutions, big data computing and service is becoming a very hot research and application subject in academic research, industry community, and government services. Nevertheless, there are increasing data quality problems resulting in erroneous data costs in enterprises and businesses. Current research seldom discusses how to effectively validate big data to ensure data quality. This paper provides informative discussions for big data validation and quality assurance, including the essential concepts, focuses, and validation process. Moreover, the paper presents a comparison among big data validation tools and several major players in industry are discussed. Furthermore, the primary issues, challenges, and needs are discussed.

[1]  M. I. Svanks Integrity analysis: methods for automating data quality assurance , 1988 .

[2]  Behzad Razavi,et al.  Principles of Data Conversion System Design , 1994 .

[3]  J. Gassman,et al.  Data quality assurance, monitoring, and reporting. , 1995, Controlled clinical trials.

[4]  Diane M. Strong,et al.  Data quality in context , 1997, CACM.

[5]  Richard Y. Wang,et al.  Data quality assessment , 2002, CACM.

[6]  Jason W. Osbourne Notes on the Use of Data Transformation. , 2002 .

[7]  David Loshin,et al.  The Practitioner's Guide to Data Quality Improvement , 2010 .

[8]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[9]  P. Mikkelsen,et al.  Data quality assurance in monitoring of wastewater quality: Univariate on-line and off-line methods , 2013 .

[10]  Roger Clarke,et al.  Big Data's Big Unintended Consequences , 2013, Computer.

[11]  Benjamin T. Hazen,et al.  Data quality for data science, predictive analytics, and big data in supply chain management: An introduction to the problem and suggestions for research and applications , 2014 .

[12]  Miriam A. M. Capretz,et al.  Challenges for MapReduce in Big Data , 2014, 2014 IEEE World Congress on Services.

[13]  Ahmed Loai Ali,et al.  Data Quality Assurance for Volunteered Geographic Information , 2014, GIScience.

[14]  Pekka Pääkkönen,et al.  Evaluating the Quality of Social Media Data in Big Data Architecture , 2015, IEEE Access.

[15]  Rachida Dssouli,et al.  Big Data Pre-processing: A Quality Framework , 2015, 2015 IEEE International Congress on Big Data.

[16]  Samani A. Talab,et al.  Enhanced Extraction Clinical Data Technique to Improve Data Quality in Clinical Data Warehouse , 2015 .

[17]  A. Koronios,et al.  Classifying Data Quality Problems in Asset Management , 2015 .

[18]  Yangyong Zhu,et al.  The Challenges of Data Quality and Data Quality Assessment in the Big Data Era , 2015, Data Sci. J..

[19]  Jorge Bernardino,et al.  A Survey on Data Quality: Classifying Poor Data , 2015, 2015 IEEE 21st Pacific Rim International Symposium on Dependable Computing (PRDC).