Enhancing Veracity of IoT Generated Big Data in Decision Making

Data are crucial to support decision making. If data have low veracity, decisions are not likely to be sound. Internet of Things (IoT) generates big data with inaccuracy, inconsistency, incompleteness, deception, and model approximation. Enhancing data veracity is important to address these challenges. In this article, we summarize the key characteristics and challenges of IoT, which influence data processing and decision making. We review the landscape of measuring and enhancing data veracity and mining uncertain data streams. Moreover, we propose five recommendations for future development of veracious big IoT data analytics that are related to the heterogeneous and distributed nature of IoT data, autonomous decision-making, context-aware and domain-optimized methodologies, data cleaning and processing techniques for IoT edge devices, and privacy preserving, personalized, and secure data management.

[1]  D. Rubin,et al.  Statistical Analysis with Missing Data , 1988 .

[2]  Diane M. Strong,et al.  Data quality in context , 1997, CACM.

[3]  Dorian Pyle,et al.  Data Preparation for Data Mining , 1999 .

[4]  Erhard Rahm,et al.  Data Cleaning: Problems and Current Approaches , 2000, IEEE Data Eng. Bull..

[5]  Minos N. Garofalakis,et al.  Adaptive cleaning for RFID data streams , 2006, VLDB.

[6]  Philip S. Yu,et al.  A Framework for Clustering Uncertain Data Streams , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[7]  Amol Deshpande,et al.  Online Filtering, Smoothing and Probabilistic Modeling of Streaming data , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[8]  Carlo Batini,et al.  Methodologies for data quality assessment and improvement , 2009, CSUR.

[9]  Wolfgang Lehner,et al.  Representing Data Quality in Sensor Data Streaming Environments , 2009, JDIQ.

[10]  Michael Zink,et al.  Capturing Data Uncertainty in High-Volume Stream Processing , 2009, CIDR.

[11]  Philip S. Yu,et al.  A Survey of Uncertain Data Algorithms and Applications , 2009, IEEE Transactions on Knowledge and Data Engineering.

[12]  Charu C. Aggarwal,et al.  Managing and Mining Uncertain Data , 2009, Advances in Database Systems.

[13]  Denis Chartrand,et al.  Statistics Canada's Quality Assurance Framework Applied to Agricultural Statistics , 2010 .

[14]  Ioan Dumitrache,et al.  The Intelligent Manufacturing Paradigm in Knowledge Society , 2010 .

[15]  Li Guo,et al.  SKIF: a data imputation framework for concept drifting data streams , 2010, CIKM.

[16]  Anna Liu,et al.  PODS: a new model and processing algorithms for uncertain data streams , 2010, SIGMOD Conference.

[17]  Xue Li,et al.  Classifier Ensemble for Uncertain Data Stream Classification , 2010, PAKDD.

[18]  Yang Zhang,et al.  Decision Tree for Dynamic and Uncertain Data Streams , 2010, ACML.

[19]  Lakshmi. S. Dutt,et al.  Handling of Uncertainty – A Survey , 2013 .

[20]  Sylvie Servigne,et al.  Managing Sensor Data Uncertainty: A Data Quality Approach , 2013, Int. J. Agric. Environ. Inf. Syst..

[21]  Wee Keong Ng,et al.  A survey on data stream clustering and classification , 2015, Knowledge and Information Systems.

[22]  Fei Chen,et al.  A Parallel Algorithm for Datacleansing in Incomplete Information Systems Using MapReduce , 2014, 2014 Tenth International Conference on Computational Intelligence and Security.

[23]  Robert K. Cunningham,et al.  Computing on masked data: a high performance method for improving big data veracity , 2014, 2014 IEEE High Performance Extreme Computing Conference (HPEC).

[24]  Divesh Srivastava,et al.  Data quality: The other face of Big Data , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[25]  Yi Pan,et al.  A Parallel Matrix-Based Method for Computing Approximations in Incomplete Information Systems , 2015, IEEE Transactions on Knowledge and Data Engineering.

[26]  Lilly Suriani Affendey,et al.  A Systematic Review on the Profiling of Digital News Portal for Big Data Veracity , 2015 .

[27]  Dieter Gollmann,et al.  The Process Matters: Ensuring Data Veracity in Cyber-Physical Systems , 2015, AsiaCCS.

[28]  Vivek Kale Big Data Computing: A Guide For Business and Technology Managers , 2016 .

[29]  Jukka Riekki,et al.  Privacy as a Service: Protecting the Individual in Healthcare Data Processing , 2016, Computer.

[30]  Francisco Herrera,et al.  Tutorial on practical tips of the most influential data preprocessing algorithms in data mining , 2016, Knowl. Based Syst..

[31]  Francisco Herrera,et al.  Big data preprocessing: methods and prospects , 2016 .