A model-driven architecture-based data quality management framework for the internet of Things

The internet of Things (IoT) is a data stream environment where a large scale deployment of smart things continuously report readings. These data streams are then consumed by pervasive applications, i.e. data consumers, to offer ubiquitous services. The data quality (DQ) is a key criteria for IoT data consumers especially when considering the inherent uncertainty of sensor-enabled data. However, DQ is a highly subjective concept and there is no standard agreement of how to determine “good” data. Moreover, the combinations of considered measured attributes and associated DQ information are as diverse as the needs of data consumers. This introduces expensive overheads for data consumers that desire a specifically built system for managing their DQ information. To effectively handle these various perceptions of DQ, we propose a Model-Driven Architecture-based approach that allows the data consumer to easily and efficiently express, through models, his vision of DQ and its requirements using an easy-to-use graphical model editor. The defined DQ specifications are then automatically transformed to generate an entire infrastructure for DQ management that fits perfectly the data consumer's requirements. We demonstrate the flexibility and the efficiency of our approach through a real life data stream environment scenario.

[1]  Theodore Johnson,et al.  Exploratory Data Mining and Data Cleaning , 2003 .

[2]  Frederick Reiss,et al.  TelegraphCQ: continuous dataflow processing , 2003, SIGMOD '03.

[3]  Athanasios V. Vasilakos,et al.  When Things Matter: A Data-Centric View of the Internet of Things , 2014, ArXiv.

[4]  Wolfgang Lehner,et al.  Representing Data Quality in Sensor Data Streaming Environments , 2009, JDIQ.

[5]  Qiang Chen,et al.  Aurora : a new model and architecture for data stream management ) , 2006 .

[6]  Alex Delis,et al.  Outlier-Aware Data Aggregation in Sensor Networks , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[7]  Gustavo Alonso,et al.  Declarative Support for Sensor Data Cleaning , 2006, Pervasive.

[8]  Frederick Reiss,et al.  TelegraphCQ: Continuous Dataflow Processing for an Uncertain World , 2003, CIDR.

[9]  Johannes Gehrke,et al.  Query Processing in Sensor Networks , 2003, CIDR.

[10]  Carlo Batini,et al.  Data Quality , 2008, Encyclopedia of GIS.

[11]  Gary L. Raines,et al.  Elements of spatial data quality , 1997 .

[12]  Javier Mauricio Pinto-Valverde,et al.  HDQM2: Healthcare Data Quality Maturity Model , 2013 .

[13]  Diane M. Strong,et al.  Data quality in context , 1997, CACM.

[14]  Vivek Sehgal,et al.  1 SensoClean : Handling Noisy and Incomplete Data in Sensor Networks using Modeling , 2005 .

[15]  Gerd Wagner,et al.  Ontologies, Meta-models, and the Model-Driven Paradigm , 2006, Ontologies for Software Engineering and Software Technology.

[16]  Michael Gertz,et al.  Detection and Exploration of Outlier Regions in Sensor Data Streams , 2008, 2008 IEEE International Conference on Data Mining Workshops.

[17]  Jun Rao,et al.  A deferred cleansing method for RFID data analytics , 2006, VLDB.

[18]  Carlo Batini,et al.  Data Quality: Concepts, Methodologies and Techniques , 2006, Data-Centric Systems and Applications.

[19]  Mario Piattini,et al.  Capturing data quality requirements for web applications by means of DQ_WebRE , 2011, BEWEB '11.

[20]  Sammy W. Pearson,et al.  Development of a Tool for Measuring and Analyzing Computer User Satisfaction , 1983 .

[21]  Wolfgang Lehner,et al.  Robust Real-time Query Processing with QStream , 2005, VLDB.

[22]  Wolfgang Lehner,et al.  Representing Data Quality for Streaming and Static Data , 2007, 2007 IEEE 23rd International Conference on Data Engineering Workshop.

[23]  Diane M. Strong,et al.  Beyond Accuracy: What Data Quality Means to Data Consumers , 1996, J. Manag. Inf. Syst..