A model-driven framework for data quality management in the Internet of Things

The internet of Things (IoT) is a data stream environment where a large scale deployment of smart things continuously report readings. These data streams are then consumed by pervasive applications, i.e. data consumers, to offer ubiquitous services. The data quality (DQ) is a key criteria for IoT data consumers especially when considering the inherent uncertainty of sensor-enabled data. However, DQ is a highly subjective concept and there is no standard agreement on how to determine “good” data. Moreover, the combinations of considered measured attributes and associated DQ information are as diverse as the needs of data consumers. This introduces expensive overheads for developers tasked with building DQ-aware IoT software systems which are capable of managing their own DQ information. To effectively handle these various perceptions of DQ, we propose a Model-Driven Architecture-based approach that allows each developer to easily and efficiently express, through models and other provided resources, the data consumer’s vision of DQ and its requirements using an easy-to-use graphical model editor. The defined DQ specifications are then automatically transformed to generate an entire infrastructure for DQ management that fits perfectly the data consumer’s requirements. We demonstrate the flexibility and the efficiency of our approach by generating two DQ management infrastructures built on top of different platforms and testing them through a real life data stream environment scenario.

[1]  Axel Uhl,et al.  Model-Driven Architecture , 2002, OOIS Workshops.

[2]  Frederick Reiss,et al.  TelegraphCQ: continuous dataflow processing , 2003, SIGMOD '03.

[3]  Gerd Wagner,et al.  Ontologies, Meta-models, and the Model-Driven Paradigm , 2006, Ontologies for Software Engineering and Software Technology.

[4]  Brian Lee,et al.  Context aware model-based cleaning of data streams , 2015, 2015 26th Irish Signals and Systems Conference (ISSC).

[5]  Gustavo Alonso,et al.  Declarative Support for Sensor Data Cleaning , 2006, Pervasive.

[6]  Jun Huang,et al.  An in-network data cleaning approach for wireless sensor networks , 2016, Intell. Autom. Soft Comput..

[7]  Michael Stonebraker,et al.  Aurora: a new model and architecture for data stream management , 2003, The VLDB Journal.

[8]  Jun Rao,et al.  A deferred cleansing method for RFID data analytics , 2006, VLDB.

[9]  Carlo Batini,et al.  Data Quality: Concepts, Methodologies and Techniques , 2006, Data-Centric Systems and Applications.

[10]  Mario Piattini,et al.  Capturing data quality requirements for web applications by means of DQ_WebRE , 2013, Inf. Syst. Frontiers.

[11]  Wolfgang Lehner,et al.  Robust Real-time Query Processing with QStream , 2005, VLDB.

[12]  Sammy W. Pearson,et al.  Development of a Tool for Measuring and Analyzing Computer User Satisfaction , 1983 .

[13]  Frederick Reiss,et al.  TelegraphCQ: Continuous Dataflow Processing for an Uncertain World , 2003, CIDR.

[14]  Johannes Gehrke,et al.  Query Processing in Sensor Networks , 2003, CIDR.

[15]  Sandeep K. Sood,et al.  IoT-based cloud framework to control Ebola virus outbreak , 2016, Journal of Ambient Intelligence and Humanized Computing.

[16]  Theodore Johnson,et al.  Exploratory Data Mining and Data Cleaning , 2003 .

[17]  Michael Gertz,et al.  Detection and Exploration of Outlier Regions in Sensor Data Streams , 2008, 2008 IEEE International Conference on Data Mining Workshops.

[18]  Javier Mauricio Pinto-Valverde,et al.  HDQM2: Healthcare Data Quality Maturity Model , 2013 .

[19]  Cesar Analide,et al.  Ubiquitous driving and community knowledge , 2017, J. Ambient Intell. Humaniz. Comput..

[20]  Hajar Mousannif,et al.  Data quality in internet of things: A state-of-the-art survey , 2016, J. Netw. Comput. Appl..

[21]  Wolfgang Lehner,et al.  Representing Data Quality for Streaming and Static Data , 2007, 2007 IEEE 23rd International Conference on Data Engineering Workshop.

[22]  Hoan Quoc Nguyen-Mau,et al.  The Graph of Things: A step towards the Live Knowledge Graph of connected things , 2016, J. Web Semant..

[23]  Rajeev Kumar Kanth,et al.  Distributed internal anomaly detection system for Internet-of-Things , 2016, 2016 13th IEEE Annual Consumer Communications & Networking Conference (CCNC).

[24]  Athanasios V. Vasilakos,et al.  When Things Matter: A Data-Centric View of the Internet of Things , 2014, ArXiv.

[25]  Wolfgang Lehner,et al.  Representing Data Quality in Sensor Data Streaming Environments , 2009, JDIQ.

[26]  S. Guptill,et al.  Elements of Spatial Data Quality , 1995 .

[27]  Alex Delis,et al.  Outlier-Aware Data Aggregation in Sensor Networks , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[28]  Diane M. Strong,et al.  Data quality in context , 1997, CACM.

[29]  Anne Marsden,et al.  International Organization for Standardization , 2014 .

[30]  Diane M. Strong,et al.  Beyond Accuracy: What Data Quality Means to Data Consumers , 1996, J. Manag. Inf. Syst..