Missing Data Problem in the Monitoring System: A Review

Missing data is a common phenomenon in sensor networks, especially in the large-scale monitoring system. It can be affected by various kinds of reasons. Moreover, incomplete data or information may affect the subsequent data processing and reasoning, resulting in a wrong decision. Hence, missing data recovery has always been a hot topic in the literature. However, although researchers have developed many different methods to recover missing data, this problem is still far away from being solved. To better track research progress and identify potential challenges, in this paper, we give a detailed review in the context of large-scale monitoring system. Mainly, we first introduce the basic concept of missing data, including the definition, causes, types, and performance evaluation. Then, a series of traditional and classical missing data recovery methods are analyzed and compared where their characteristics and scope of application are given. Furthermore, we present two current mainstream approaches from methodology to the existing literature, which are data recovery based on data mining algorithms and low rank algorithms, respectively. Finally, we conclude this paper with several promising directions for future research.

[1]  C.J.F. ter Braak,et al.  Analysis of monitoring data with many missing values: which method? , 1994 .

[2]  Yuanyuan Zhang,et al.  A Method for Missing Data Recovery of Waste Gas Monitoring in Animal Building Based on GA-SVM , 2015 .

[3]  Kincho H. Law,et al.  A data-driven approach for sensor data reconstruction for bridge monitoring * , 2017 .

[4]  Muhammad Tayyab Asif,et al.  Low-dimensional models for missing data imputation in road networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[6]  Hyo Seon Park,et al.  Convolutional neural network–based data recovery method for structural health monitoring , 2020, Structural Health Monitoring.

[7]  Harri Niska,et al.  Methods for imputation of missing values in air quality data sets , 2004 .

[8]  Ying Zhang,et al.  Data Recovery in Wireless Sensor Networks With Joint Matrix Completion and Sparsity Constraints , 2015, IEEE Communications Letters.

[9]  Lorenzo Bruzzone,et al.  Kernel-based methods for hyperspectral image classification , 2005, IEEE Transactions on Geoscience and Remote Sensing.

[10]  S. Frick,et al.  Compressed Sensing , 2014, Computer Vision, A Reference Guide.

[11]  Naixue Xiong,et al.  Data prediction, compression, and recovery in clustered wireless sensor networks for environmental monitoring applications , 2016, Inf. Sci..

[12]  Taghi M. Khoshgoftaar,et al.  Deep learning applications and challenges in big data analytics , 2015, Journal of Big Data.

[13]  Yong Wang,et al.  Learning Traffic as Images: A Deep Convolutional Neural Network for Large-Scale Transportation Network Speed Prediction , 2017, Sensors.

[14]  Taghi M. Khoshgoftaar,et al.  Multiple Imputation of Missing Values in Software Measurement Data , 2007 .

[15]  Jiawei Wang,et al.  Missing traffic data imputation and pattern discovery with a Bayesian augmented tensor factorization model , 2019, Transportation Research Part C: Emerging Technologies.

[16]  Teuvo Kohonen,et al.  An introduction to neural computing , 1988, Neural Networks.

[17]  J. Graham,et al.  Missing data analysis: making it work in the real world. , 2009, Annual review of psychology.

[18]  Nasser M. Nasrabadi,et al.  Matrix Completion for Graph-Based Deep Semi-Supervised Learning , 2019, AAAI.

[19]  Junjie Wu,et al.  Traffic Speed Prediction and Congestion Source Exploration: A Deep Learning Method , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[20]  Hwa-Young Jeong,et al.  Monitoring System with Wireless Sensor Network: A Survey , 2013 .

[21]  Yingfeng Cai,et al.  Ensemble correlation-based low-rank matrix completion with applications to traffic data imputation , 2017, Knowl. Based Syst..

[22]  Thad C Pratt,et al.  Field Data Recovery in Tidal System Using Artificial Neural Networks (ANNs) , 2001 .

[23]  Junfeng Yang,et al.  Linearized augmented Lagrangian and alternating direction methods for nuclear norm minimization , 2012, Math. Comput..

[24]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[25]  Jun Sun,et al.  Compressive data gathering for large-scale wireless sensor networks , 2009, MobiCom '09.

[26]  Craig K. Enders,et al.  An introduction to modern missing data analyses. , 2010, Journal of school psychology.

[27]  Bin Ran,et al.  Traffic Speed Data Imputation Method Based on Tensor Completion , 2015, Comput. Intell. Neurosci..

[28]  Yu Peng,et al.  DRES: Data recovery for condition monitoring to enhance system reliability , 2016, Microelectron. Reliab..

[29]  David E. Booth,et al.  Analysis of Incomplete Multivariate Data , 2000, Technometrics.

[30]  Zhe Chen,et al.  Anomaly Detection and Redundancy Elimination of Big Sensor Data in Internet of Things , 2017, ArXiv.

[31]  Alfred Stein,et al.  Application of the Expectation Maximization Algorithm to Estimate Missing Values in Gaussian Bayesian Network Modeling for Forest Growth , 2012, IEEE Transactions on Geoscience and Remote Sensing.

[32]  Noboru Ishihara,et al.  Polynomial Regression Techniques for Environmental Data Recovery in Wireless Sensor Networks , 2016 .

[33]  Yuqing Ma,et al.  Data-driven missing data imputation in cluster monitoring system based on deep neural network , 2019, Applied Intelligence.

[34]  D. Rubin,et al.  Ignorability and Coarse Data , 1991 .

[35]  Krzysztof Grudzien,et al.  Visualization System for Large-Scale Silo Flow Monitoring Based on ECT Technique , 2017, IEEE Sensors Journal.

[36]  William A Ghali,et al.  Multiple imputation versus data enhancement for dealing with missing data in observational health care outcome analyses. , 2002, Journal of clinical epidemiology.

[37]  Michael I. Jordan,et al.  Supervised learning from incomplete data via an EM approach , 1993, NIPS.

[38]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[39]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[40]  Joe H. Chow,et al.  Multichannel Hankel Matrix Completion Through Nonconvex Optimization , 2018, IEEE Journal of Selected Topics in Signal Processing.

[41]  Lena Osterhagen,et al.  Multiple Imputation For Nonresponse In Surveys , 2016 .

[42]  Yi Zhang,et al.  PPCA-Based Missing Data Imputation for Traffic Flow Volume: A Systematical Approach , 2009, IEEE Transactions on Intelligent Transportation Systems.

[43]  J. Schafer,et al.  Missing data: our view of the state of the art. , 2002, Psychological methods.

[44]  Yi Ma,et al.  Robust principal component analysis?: Recovering low-rank matrices from sparse errors , 2010, 2010 IEEE Sensor Array and Multichannel Signal Processing Workshop.

[45]  Andrea Montanari,et al.  Matrix Completion from Noisy Entries , 2009, J. Mach. Learn. Res..

[46]  Khalid Saleem,et al.  K-Nearest Temperature Trends: A Method for Weather Temperature Data Imputation , 2017, ICISDM '17.

[47]  Bin Ran,et al.  Robust and flexible strategy for missing data imputation in intelligent transportation system , 2017 .

[48]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[49]  Hongyi Sun,et al.  Grey Relational Analysis Based k Nearest Neighbor Missing Data Imputation for Software Quality Datasets , 2016, 2016 IEEE International Conference on Software Quality, Reliability and Security (QRS).

[50]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2008, Found. Comput. Math..

[51]  Terence D. Sanger,et al.  Optimal unsupervised learning in a single-layer linear feedforward neural network , 1989, Neural Networks.

[52]  Jieping Ye,et al.  Tensor Completion for Estimating Missing Values in Visual Data , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53]  G. Manimaran,et al.  Energy minimization by exploiting data redundancy in real-time wireless sensor networks , 2013, Ad Hoc Networks.

[54]  Minho Jo,et al.  Probabilistic Recovery of Incomplete Sensed Data in IoT , 2018, IEEE Internet of Things Journal.

[55]  Sixia Chen,et al.  Recent Developments in Dealing with Item Non‐response in Surveys: A Critical Review , 2018, International Statistical Review.

[56]  Scott G. Ghiocel,et al.  Missing Data Recovery by Exploiting Low-Dimensionality in Power System Synchrophasor Measurements , 2016, IEEE Transactions on Power Systems.

[57]  Zhi Zhou,et al.  Large-Scale Wireless Temperature Monitoring System for Liquefied Petroleum Gas Storage Tanks , 2015, Sensors.

[58]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[59]  Biao Huang,et al.  Expectation–Maximization Approach to Fault Diagnosis With Missing Data , 2015, IEEE Transactions on Industrial Electronics.

[60]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[61]  Yingchi Mao,et al.  DNN-MVL: DNN-Multi-View-Learning-Based Recover Block Missing Data in a Dam Safety Monitoring System , 2019, Sensors.

[62]  Xinyu Chen,et al.  Spatial-temporal traffic speed patterns discovery and incomplete data recovery via SVD-combined tensor decomposition , 2018 .

[63]  Nazim Agoulmine,et al.  Multiple linear regression to improve prediction accuracy in WSN data reduction , 2011, 2011 7th Latin American Network Operations and Management Symposium.

[64]  Wei Meng,et al.  Evaluation of missing value imputation methods for wireless soil datasets , 2017, Personal and Ubiquitous Computing.

[65]  Patrick L. Combettes,et al.  Signal Recovery by Proximal Forward-Backward Splitting , 2005, Multiscale Model. Simul..

[66]  Wei Yuan,et al.  Data Recovery and Alerting Schemes for Faulty Sensors in IWSNs , 2016 .

[67]  Yunpeng Wang,et al.  Spatiotemporal Recurrent Convolutional Networks for Traffic Prediction in Transportation Networks , 2017, Sensors.

[68]  Antonio Sánchez-Esguevillas,et al.  Performance Study of the Application of Artificial Neural Networks to the Completion and Prediction of Data Retrieved by Underwater Sensors , 2012, Sensors.

[69]  Alex C. Kot,et al.  Heterogeneous Transfer Learning via Deep Matrix Completion with Adversarial Kernel Embedding , 2019, AAAI.

[70]  T. Stijnen,et al.  Review: a gentle introduction to imputation of missing values. , 2006, Journal of clinical epidemiology.

[71]  Alexander J. Smola,et al.  Support Vector Regression Machines , 1996, NIPS.

[72]  Nathan Srebro,et al.  Implicit Regularization in Matrix Factorization , 2017, 2018 Information Theory and Applications Workshop (ITA).

[73]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[74]  Seong-Whan Lee,et al.  Latent feature representation with stacked auto-encoder for AD/MCI diagnosis , 2013, Brain Structure and Function.

[75]  Yong Sun,et al.  Air Pollutants Monitoring Data Recovery of Henhouse Based on QGSA-SVM , 2016, AST 2016.

[76]  Di Guo,et al.  Sparsity-Based Online Missing Data Recovery Using Overcomplete Dictionary , 2012, IEEE Sensors Journal.

[77]  Jiannong Cao,et al.  Recover Corrupted Data in Sensor Networks: A Matrix Completion Solution , 2017, IEEE Transactions on Mobile Computing.

[78]  Lina Yao,et al.  Interpolating the Missing Values for Multi-Dimensional Spatial-Temporal Sensor Data: A Tensor SVD Approach , 2017, MobiQuitous.

[79]  Sun-Cheon Park,et al.  Artificial Neural Network-Based Data Recovery System for the Time Series of Tide Stations , 2015 .

[80]  Song Gao,et al.  An Imputation Method for Missing Traffic Data Based on FCM Optimized by PSO-SVR , 2018 .

[81]  Weining Zhang,et al.  A deep learning method for data recovery in sensor networks using effective spatio-temporal correlation data , 2019 .

[82]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[83]  Donald B. Rubin,et al.  Performing likelihood ratio tests with multiply-imputed data sets , 1992 .

[84]  Chih-Fong Tsai,et al.  Missing value imputation: a review and analysis of the literature (2006–2017) , 2019, Artificial Intelligence Review.

[85]  David Boyle,et al.  Securing Wireless Sensor Networks: Security Architectures , 2008, J. Networks.

[86]  Nor Azam Ramli,et al.  Comparison of Linear Interpolation Method and Mean Method to Replace the Missing Values in Environmental Data Set , 2014 .