Sequence Imputation using Machine Learning with Early Stopping Mechanism

In today’s world, humans need data from every situation, even from conditions where humans cannot directly extract data or from places that are hazardous for human beings. Wireless sensor networks or WSNs can solve this problem. They gather the required information using their sensors and provide the human race with the necessary data. Missing data are unavoidable in WSN, due to issues such as network communication outage, sensor maintenance or failure, etc. A complete informed and accurate data can help make the right decision. Just getting the missing data is not essential; it is also crucial to get this information as quickly as possible. A sequence imputation model with an encoder-decoder model is used to identify the missing values in the sensor network. It uses SSIM, with LSTM, attention layer and an early stopping layer, which gives comparable results in the minimum time by taking into consideration the past and the future information. The training and testing data is the subset of the hourly dataset of PM2.5 data of the US Embassy in Beijing. This model was able to produce accurate values keeping in mind the time.

[1]  Chen-Wuing Liu,et al.  Estimation of the spatial rainfall distribution using inverse distance weighting (IDW) in the middle of Taiwan , 2012, Paddy and Water Environment.

[2]  Yuexiong Ding,et al.  Transfer learning for long-interval consecutive missing values imputation without external features in air pollution time series , 2020, Adv. Eng. Informatics.

[3]  Christopher D. Manning,et al.  Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[4]  Amaury Lendasse,et al.  An improved methodology for filling missing values in spatiotemporal climate data set , 2010 .

[5]  Victor O. K. Li,et al.  Trainable Greedy Decoding for Neural Machine Translation , 2017, EMNLP.

[6]  Graham K. Rand,et al.  Quantitative Applications in the Social Sciences , 1983 .

[7]  Kenneth G. Hubbard,et al.  The Effects of Data Gaps on the Calculated Monthly Mean Maximum and Minimum Temperatures in the Continental United States: A Spatial and Temporal Study , 1999 .

[8]  Ke Lu,et al.  Missing data imputation by K nearest neighbours based on grey relational structure and mutual information , 2015, Applied Intelligence.

[9]  Ji Jia,et al.  Imputation of Missing Data in Time Series for Air Pollutants Using Long Short-Term Memory Recurrent Neural Networks , 2018, UbiComp/ISWC Adjunct.

[10]  Jürgen Schmidhuber,et al.  Applying LSTM to Time Series Predictable through Time-Window Approaches , 2000, ICANN.

[11]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[12]  Weiwei Chen,et al.  A bi-directional missing data imputation scheme based on LSTM and transfer learning for building energy data , 2020, Energy and Buildings.

[13]  Tshilidzi Marwala,et al.  Missing Data Prediction and Classification: The Use of Auto-Associative Neural Networks and Optimization Algorithms , 2014, ArXiv.

[14]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[15]  Robert Kohn,et al.  On the estimation of ARIMA Models with Missing Values , 1984 .

[16]  Zili Huang,et al.  Recover Missing Sensor Data with Iterative Imputing Network , 2017, AAAI Workshops.

[17]  Andreas S. Weigend,et al.  Time Series Prediction: Forecasting the Future and Understanding the Past , 1994 .

[18]  Jianzhong Li,et al.  K-Nearest Neighbor Based Missing Data Estimation Algorithm in Wireless Sensor Networks , 2010, Wirel. Sens. Netw..

[19]  J. Graham,et al.  Missing data analysis: making it work in the real world. , 2009, Annual review of psychology.

[20]  M. Valipour,et al.  Comparison of the ARMA, ARIMA, and the autoregressive artificial neural network models in forecasting the monthly inflow of Dez dam reservoir , 2013 .

[21]  Yi-Fan Zhang,et al.  SSIM—A Deep Learning Approach for Recovering Missing Time Series Sensor Data , 2018, IEEE Internet of Things Journal.

[22]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[23]  Cem Iyigun,et al.  Comparison of missing value imputation methods in time series: the case of Turkish meteorological data , 2013, Theoretical and Applied Climatology.

[24]  Shih-Chii Liu,et al.  Phased LSTM: Accelerating Recurrent Network Training for Long or Event-based Sequences , 2016, NIPS.

[25]  Washington Leite Junger,et al.  Imputation of missing data in time series for air pollutants , 2015 .

[26]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[27]  T. Schneider Analysis of Incomplete Climate Data: Estimation of Mean Values and Covariance Matrices and Imputation of Missing Values. , 2001 .

[28]  Alexander M. Rush,et al.  Sequence-to-Sequence Learning as Beam-Search Optimization , 2016, EMNLP.