Data quality in internet of things: A state-of-the-art survey

In the Internet of Things (IoT), data gathered from a global-scale deployment of smart-things, are the base for making intelligent decisions and providing services. If data are of poor quality, decisions are likely to be unsound. Data quality (DQ) is crucial to gain user engagement and acceptance of the IoT paradigm and services. This paper aims at enhancing DQ in IoT by providing an overview of its state-of-the-art. Data properties and their new lifecycle in IoT are surveyed. The concept of DQ is defined and a set of generic and domain-specific DQ dimensions, fit for use in assessing IoT's DQ, are selected. IoT-related factors endangering the DQ and their impact on various DQ dimensions and on the overall DQ are exhaustively analyzed. DQ problems manifestations are discussed and their symptoms identified. Data outliers, as a major DQ problem manifestation, their underlying knowledge and their impact in the context of IoT and its applications are studied. Techniques for enhancing DQ are presented with a special focus on data cleaning techniques which are reviewed and compared using an extended taxonomy to outline their characteristics and their fitness for use for IoT. Finally, open challenges and possible future research directions are discussed.

[1]  Alberto M. C. Souza,et al.  An Outlier Detect Algorithm using Big Data Processing and Internet of Things Architecture , 2015, ANT/SEIT.

[2]  Catherine Mulligan,et al.  From Machine-to-Machine to the Internet of Things - Introduction to a New Age of Intelligence , 2014 .

[3]  Alessandro Bassi,et al.  Enabling Things to Talk , 2013, Springer Berlin Heidelberg.

[4]  João Barroso,et al.  Towards Reusing Data Cleaning Knowledge , 2015, WorldCIST.

[5]  Laure Berti-Équille,et al.  Measuring and Modelling Data Quality for Quality-Awareness in Data Mining , 2007, Quality Measures in Data Mining.

[6]  Wolfgang Lehner,et al.  How to Optimize the Quality of Sensor Data Streams , 2009, 2009 Fourth International Multi-Conference on Computing in the Global Information Technology.

[7]  Haitham S. Hamza,et al.  SIGHTED: A Framework for Semantic Integration of Heterogeneous Sensor Data on the Internet of Things , 2016, ANT/SEIT.

[8]  Michel Riveill,et al.  e-Health monitoring applications: What about Data Quality? , 2010 .

[9]  Diane M. Strong,et al.  Data quality in context , 1997, CACM.

[10]  G. Eysenbach What is e-health? , 2001, Journal of Medical Internet Research.

[11]  Florian Michahelles,et al.  Architecting the Internet of Things , 2011 .

[12]  Monique W. M. Jaspers,et al.  A framework for performance and data quality assessment of Radio Frequency IDentification (RFID) systems in health care settings , 2011, J. Biomed. Informatics.

[13]  Antonio Iera,et al.  The Social Internet of Things (SIoT) - When social networks meet the Internet of Things: Concept, architecture and network characterization , 2012, Comput. Networks.

[14]  Pin Zhou,et al.  Demystifying data deduplication , 2008, Companion '08.

[15]  Ali A. Ghorbani,et al.  Network Anomaly Detection Based on Wavelet Analysis , 2009, EURASIP J. Adv. Signal Process..

[16]  Sergio M. Savaresi,et al.  Unsupervised learning techniques for an intrusion detection system , 2004, SAC '04.

[17]  Donald P. Ballou,et al.  Designing Information Systems to Optimize the Accuracy-Timeliness Tradeoff , 1995, Inf. Syst. Res..

[18]  Luigi Alfredo Grieco,et al.  Security, privacy and trust in Internet of Things: The road ahead , 2015, Comput. Networks.

[19]  Min Chen,et al.  A Survey on Internet of Things From Industrial Market Perspective , 2015, IEEE Access.

[20]  Sammy W. Pearson,et al.  Development of a Tool for Measuring and Analyzing Computer User Satisfaction , 1983 .

[21]  Claudio Carpineto,et al.  A Survey of Automatic Query Expansion in Information Retrieval , 2012, CSUR.

[22]  Norbert Silvera,et al.  Accuracy of interpolation techniques for the derivation of digital elevation models in relation to landform types and data density , 2006 .

[23]  Yu Zheng,et al.  Computing with Spatial Trajectories , 2011, Computing with Spatial Trajectories.

[24]  Quan Z. Sheng,et al.  Efficiently managing uncertain data in RFID sensor networks , 2014, World Wide Web.

[25]  Eleonora Borgia,et al.  The Internet of Things vision: Key features, applications and open issues , 2014, Comput. Commun..

[26]  Sean Bechhofer,et al.  OWL: Web Ontology Language , 2009, Encyclopedia of Database Systems.

[27]  Marimuthu Palaniswami,et al.  Internet of Things (IoT): A vision, architectural elements, and future directions , 2012, Future Gener. Comput. Syst..

[28]  Hua-Dong Ma,et al.  Internet of Things: Objectives and Scientific Challenges , 2011, Journal of Computer Science and Technology.

[29]  Nirvana Meratnia,et al.  Outlier Detection Techniques for Wireless Sensor Networks: A Survey , 2008, IEEE Communications Surveys & Tutorials.

[30]  Arkady B. Zaslavsky,et al.  Context Aware Computing for The Internet of Things: A Survey , 2013, IEEE Communications Surveys & Tutorials.

[31]  Carlo Batini,et al.  Data Quality: Concepts, Methodologies and Techniques (Data-Centric Systems and Applications) , 2006 .

[32]  Schahram Dustdar,et al.  Data Quality Observation in Pervasive Environments , 2012, 2012 IEEE 15th International Conference on Computational Science and Engineering.

[33]  Nalini Venkatasubramanian,et al.  Privacy protecting data collection in media spaces , 2004, MULTIMEDIA '04.

[34]  Zhaohua Wang,et al.  A study of the unification of multisensor data , 2012, 2012 International Conference on Audio, Language and Image Processing.

[35]  Ran Wolff,et al.  In-Network Outlier Detection in Wireless Sensor Networks , 2006, ICDCS.

[36]  Yong Liao,et al.  Vehicle Anomaly Detection Based on Trajectory Data of ANPR System , 2014, GLOBECOM 2014.

[37]  Tao Liu,et al.  An Improved RFID Data Cleaning Algorithm Based on Sliding Window , 2012 .

[38]  Gregory D. Abowd,et al.  Charting past, present, and future research in ubiquitous computing , 2000, TCHI.

[39]  Dimitris Kiritsis,et al.  Closed-loop PLM for intelligent products in the era of the Internet of things , 2011, Comput. Aided Des..

[40]  Srinivasan Parthasarathy,et al.  Fast Distributed Outlier Detection in Mixed-Attribute Data Sets , 2006, Data Mining and Knowledge Discovery.

[41]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[42]  Davis,et al.  Principles of Data Mining , 2001 .

[43]  Zheng Yan,et al.  Encrypted Data Management with Deduplication in Cloud Computing , 2016, IEEE Cloud Computing.

[44]  Rajeev Kumar Kanth,et al.  Distributed internal anomaly detection system for Internet-of-Things , 2016, 2016 13th IEEE Annual Consumer Communications & Networking Conference (CCNC).

[45]  Hsien-Tsung Chang,et al.  A Cognitive Oriented Framework for IoT Big-data Management Prospective , 2014, 2014 IEEE International Conference on Communiction Problem-solving.

[46]  Jadwiga Indulska,et al.  An Autonomic Context Management System for Pervasive Computing , 2008, 2008 Sixth Annual IEEE International Conference on Pervasive Computing and Communications (PerCom).

[47]  Carlo Batini,et al.  Data Quality: Concepts, Methodologies and Techniques , 2006, Data-Centric Systems and Applications.

[48]  Athanasios V. Vasilakos,et al.  When Things Matter: A Data-Centric View of the Internet of Things , 2014, ArXiv.

[49]  Antonella Molinaro,et al.  Multi-source data retrieval in IoT via named data networking , 2014, ICN '14.

[50]  Simon Mayer,et al.  Moving Application Logic from the Firmware to the Cloud: Towards the Thin Server Architecture for the Internet of Things , 2012, 2012 Sixth International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing.

[51]  Lida Xu,et al.  Compressed Sensing Signal and Data Acquisition in Wireless Sensor Networks and Internet of Things , 2013, IEEE Transactions on Industrial Informatics.

[52]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[53]  C. Cicconetti,et al.  A distributed architecture for discovery and access in the internet of things , 2013, 2013 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).

[54]  Sandra Geisler,et al.  Ontology-based data quality framework for data stream applications , 2011, ICIQ.

[55]  Athanasios V. Vasilakos,et al.  Security of the Internet of Things: perspectives and challenges , 2014, Wireless Networks.

[56]  Daniele Miorandi,et al.  A security-and quality-aware system architecture for Internet of Things , 2014, Information Systems Frontiers.

[57]  Hao Wu,et al.  Evaluation of data quality of multisite electronic health record data for secondary analysis , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[58]  Athanasios V. Vasilakos,et al.  A survey on trust management for Internet of Things , 2014, J. Netw. Comput. Appl..

[59]  Wolfgang Lehner,et al.  Representing Data Quality in Sensor Data Streaming Environments , 2009, JDIQ.

[60]  Wang Chun-dong,et al.  An Intelligent Home Middleware System Based on Context-Awareness , 2009, 2009 Fifth International Conference on Natural Computation.

[61]  Sucha Smanchat,et al.  A Review of Data Management in Internet of Things , 2015 .

[62]  Charu C. Aggarwal,et al.  The Internet of Things: A Survey from the Data-Centric Perspective , 2013, Managing and Mining Sensor Data.

[63]  Ramesh Govindan,et al.  Detection and identification of network anomalies using sketch subspaces , 2006, IMC '06.

[64]  Brian Lee,et al.  A Framework for Distributed Cleaning of Data Streams , 2015, ANT/SEIT.

[65]  Theodore Johnson,et al.  Exploratory Data Mining and Data Cleaning , 2003 .

[66]  E. Drakonaki,et al.  Magnetic resonance imaging, ultrasound and real-time ultrasound elastography of the thigh muscles in congenital muscle dystrophy , 2010, Skeletal Radiology.

[67]  Liping Liu,et al.  Evolutional Data Quality: A Theory-Specific View , 2002, ICIQ.

[68]  Stephen Burgess,et al.  Information quality attributes associated with RFID‐derived benefits in the retail supply chain , 2007 .

[69]  Kjell Hole Anomaly Detection with HTM , 2016 .

[70]  Elizabeth Papadopoulou,et al.  Learning user preferences for adaptive pervasive environments: An incremental and temporal approach , 2013, TAAS.

[71]  Thomas Noël,et al.  Adding value to WSN simulation using the IoT-LAB experimental platform , 2013, 2013 IEEE 9th International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob).

[72]  Adiraju Prasanth Rao Quality Measures for Semantic Web Application , 2016 .

[73]  Javier Mauricio Pinto-Valverde,et al.  HDQM2: Healthcare Data Quality Maturity Model , 2013 .

[74]  Andrian Marcus,et al.  Data Cleansing: Beyond Integrity Analysis 1 , 2000 .

[75]  Guy Pujolle,et al.  An Autonomic-oriented Architecture for the Internet of Things , 2006, IEEE John Vincent Atanasoff 2006 International Symposium on Modern Computing (JVA'06).

[76]  Quan Z. Sheng,et al.  Matching Over Linked Data Streams in the Internet of Things , 2015, IEEE Internet Computing.

[77]  Gustavo Alonso,et al.  Declarative Support for Sensor Data Cleaning , 2006, Pervasive.

[78]  Yongdae Kim,et al.  A machine learning framework for network anomaly detection using SVM and GA , 2005, Proceedings from the Sixth Annual IEEE SMC Information Assurance Workshop.

[79]  Alexander Gluhak,et al.  A survey on facilities for experimental internet of things research , 2011, IEEE Communications Magazine.

[80]  C. Frei,et al.  Comparison of six methods for the interpolation of daily, European climate data , 2008 .

[81]  Fabio Roli,et al.  Intrusion detection in computer networks by multiple classifier systems , 2002, Object recognition supported by user interaction for service robots.

[82]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[83]  Ihab F. Ilyas,et al.  Data Cleaning: Overview and Emerging Challenges , 2016, SIGMOD Conference.

[84]  Antonio Iera,et al.  The Internet of Things: A survey , 2010, Comput. Networks.

[85]  Diane M. Strong,et al.  Beyond Accuracy: What Data Quality Means to Data Consumers , 1996, J. Manag. Inf. Syst..

[86]  Felix Naumann,et al.  Data fusion , 2009, CSUR.

[87]  Nathalie Mitton,et al.  The Discovery of Relevant Data-Sources in a Smart City Environment , 2016, 2016 IEEE International Conference on Smart Computing (SMARTCOMP).

[88]  Karol Furdik,et al.  The EBBITS Project: An Interoperability platform for a Real-world populated Internet of Things domain , 2011 .

[89]  Song Guo,et al.  The Web of Things: A Survey (Invited Paper) , 2011, J. Commun..

[90]  Jason Pascoe,et al.  Adding generic contextual capabilities to wearable computers , 1998, Digest of Papers. Second International Symposium on Wearable Computers (Cat. No.98EX215).

[91]  Philip S. Yu,et al.  A General Survey of Privacy-Preserving Data Mining Models and Algorithms , 2008, Privacy-Preserving Data Mining.

[92]  Juan Carlos Augusto,et al.  Data and Information Quality Issues in Ambient Assisted Living Systems , 2012, JDIQ.

[93]  Paolo Bellavista,et al.  A survey of context data distribution for mobile ubiquitous systems , 2012, CSUR.

[94]  Quan Z. Sheng,et al.  An Estimation Maximization Based Approach for Finding Reliable Sensors in Environmental Sensing , 2015, 2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS).

[95]  Dave Evans,et al.  How the Next Evolution of the Internet Is Changing Everything , 2011 .

[96]  Karl Aberer,et al.  A Survey of Model-based Sensor Data Acquisition and Management , 2013, Managing and Mining Sensor Data.

[97]  Yu Cao,et al.  Inclusive Smart Cities and Digital Health , 2016, Lecture Notes in Computer Science.

[98]  Steffen Staab,et al.  International Handbooks on Information Systems , 2013 .

[99]  Zhu Wang,et al.  Opportunistic IoT: Exploring the harmonious interaction between human and the internet of things , 2013, J. Netw. Comput. Appl..

[100]  Lukianova Nataliia,et al.  Internet of Things as a Symbolic Resource of Power , 2015 .

[101]  Prakash Kumar,et al.  Leveraging hadoop framework to develop duplication detector and analysis using Mapreduce, Hive and Pig , 2014, 2014 Seventh International Conference on Contemporary Computing (IC3).

[102]  Laurence T. Yang,et al.  Data Mining for Internet of Things: A Survey , 2014, IEEE Communications Surveys & Tutorials.

[103]  Minos N. Garofalakis,et al.  Adaptive cleaning for RFID data streams , 2006, VLDB.

[104]  Ulrich Güntzer,et al.  Data Quality Mining - Making a Virute of Necessity , 2001, DMKD.

[105]  Quan Z. Sheng,et al.  Cleaning Environmental Sensing Data Streams Based on Individual Sensor Reliability , 2014, WISE.

[106]  Imran Erguler,et al.  A potential weakness in RFID-based Internet-of-things systems , 2015, Pervasive Mob. Comput..

[107]  Xuan Li,et al.  A Video Deduplication Scheme with Privacy Preservation in IoT , 2015, ISICA.

[108]  Lin Sun,et al.  The architecture design of a cross-domain context management system , 2010, 2010 8th IEEE International Conference on Pervasive Computing and Communications Workshops (PERCOM Workshops).

[109]  Karl Aberer,et al.  Global Sensor Networks , 2006 .

[110]  Mouzhi Ge,et al.  Analysing the effect of security on information quality dimensions , 2009, ECIS.

[111]  Jun Huang,et al.  An in-network data cleaning approach for wireless sensor networks , 2016, Intell. Autom. Soft Comput..

[112]  Kai Zhao,et al.  A Survey on the Internet of Things Security , 2013, 2013 Ninth International Conference on Computational Intelligence and Security.

[113]  Jonathan Weinberg RFID and Privacy , 2004 .

[114]  T.Y. Lin,et al.  Anomaly detection , 1994, Proceedings New Security Paradigms Workshop.

[115]  Donald P. Ballou,et al.  Modeling Completeness versus Consistency Tradeoffs in Information Decision Contexts , 2003, IEEE Trans. Knowl. Data Eng..

[116]  Xiong Zhang,et al.  Smart city architecture: A technology guide for implementation and design challenges , 2014, China Communications.

[117]  Bill N. Schilit,et al.  Context-aware computing applications , 1994, Workshop on Mobile Computing Systems and Applications.

[118]  Wolfgang Lehner,et al.  Representing Data Quality for Streaming and Static Data , 2007, 2007 IEEE 23rd International Conference on Data Engineering Workshop.

[119]  Felix Naumann,et al.  Quality-Driven Query Answering for Integrated Information Systems , 2002, Lecture Notes in Computer Science.

[120]  Tilman Wolf,et al.  Automated Sensor Verification Using Outlier Detection in the Internet of Things , 2012, 2012 32nd International Conference on Distributed Computing Systems Workshops.

[121]  Gregory D. Abowd,et al.  Towards a Better Understanding of Context and Context-Awareness , 1999, HUC.

[122]  Prem Prakash Jayaraman,et al.  OpenIoT: Open Source Internet-of-Things in the Cloud , 2014, OpenIoT@SoftCOM.

[123]  David M. Eyers,et al.  Twenty Security Considerations for Cloud-Supported Internet of Things , 2016, IEEE Internet of Things Journal.

[124]  Antonios Deligiannakis,et al.  Detecting Outliers in Sensor Networks Using the Geometric Approach , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[125]  Jaydip Sen,et al.  Embedded security for Internet of Things , 2011, 2011 2nd National Conference on Emerging Trends and Applications in Computer Science.

[126]  Tilman Wolf,et al.  Massively Parallel Anomaly Detection in Online Network Measurement , 2008, 2008 Proceedings of 17th International Conference on Computer Communications and Networks.