Quality Evaluation for Big Data: A Scalable Assessment Approach and First Evaluation Results

High-quality data is a prerequisite for most types of analysis provided by software systems. However, since data quality does not come for free, it has to be assessed and managed continuously. The increasing quantity, diversity, and velocity that characterize big data today make these tasks even more challenging. We identified challenges that are specific for big data quality assessments with particular emphasis on their usage in smart ecosystems and make a proposal for a scalable cross-organizational approach that addresses these challenges. We developed an initial prototype to investigate scalability in a multi-node test environment using big data technologies. Based on the observed horizontal scalability behavior, there is an indication that the proposed approach also allows dealing with increasing volumes of heterogeneous data.

[1]  Reinhold Plösch,et al.  Operationalised product quality models and assessment: The Quamoco approach , 2014, Inf. Softw. Technol..

[2]  Alexander Löser,et al.  Innovationspotenzialanalyse für die neuen Technologien fürdas Verwalten und Analysieren von großen Datenmengen (Big Data Management) , 2013 .

[3]  Lachlan Mackinnon,et al.  Quality Measurement and Assessment Models including Data Provenance to grade Data sources , 2005 .

[4]  F. Boufares,et al.  Heterogeneous data-integration and data quality: Overview of conflicts , 2012, 2012 6th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT).

[5]  L. Phillips,et al.  Multi-criteria analysis: a manual , 2009 .

[6]  Kai Petersen,et al.  Systematic Mapping Studies in Software Engineering , 2008, EASE.

[7]  Alexandra Poulovassilis,et al.  A Methodology and Architecture Embedding Quality Assessment in Data Integration , 2014, JDIQ.

[8]  Steffen Lamparter,et al.  Analysis of data quality issues in real-world industrial data , 2013 .

[9]  Doubletree Hotel San Jose,et al.  The World's Most Popular Open Source Database , 2003 .

[10]  Jürgen Münch,et al.  Adapting Software Quality Models: Practical Challenges, Approach, and First Empirical Results , 2011, 2011 37th EUROMICRO Conference on Software Engineering and Advanced Applications.

[11]  Elisa Bertino,et al.  POSTER: Data quality evaluation: integrating security and accuracy , 2013, CCS.

[12]  Ross Ihaka,et al.  Gentleman R: R: A language for data analysis and graphics , 1996 .

[13]  Chien Chin Chen,et al.  Quality evaluation of product reviews using an information quality framework , 2011, Decis. Support Syst..

[14]  Carlo Batini,et al.  Methodologies for data quality assessment and improvement , 2009, CSUR.

[15]  Reinhold Plösch,et al.  The Quamoco product quality modelling and assessment approach , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[16]  Adir Even,et al.  Dual Assessment of Data Quality in Customer Databases , 2009, JDIQ.

[17]  Tsutomu Ishida,et al.  Metrics and Models in Software Quality Engineering , 1995 .

[18]  Lina Zhou,et al.  Improving financial data quality using ontologies , 2012, Decis. Support Syst..

[19]  Yangyong Zhu,et al.  The Challenges of Data Quality and Data Quality Assessment in the Big Data Era , 2015, Data Sci. J..