An unsupervised neural network approach for imputation of missing values in univariate time series data

Handling missing values in time series data plays a key role in predicting and forecasting, as complete and clean historical data help to achieve higher accuracy. Numerous research works are present in multivariate time series imputation, but imputation in univariate time series data is least considered due to correlated variables unavailability. This article aims to propose an iterative imputation algorithm by clustering univariate time series data, considering the trend, seasonality, cyclical, and residue features of the data. The proposed method uses a similarity based nearest neighbor imputation approach on each clusters for filling missing values. The proposed method is evaluated on publicly available data set from the data market repository and UCI repository by randomly simulating missing patterns under low, moderate, and high missingness rates throughout the data series. The proposed method's outcome is evaluated with the imputeTestbench package with root mean squared error as an error metric and validated through prediction accuracy and concordance correlation coefficient statistical test. Experimental results show that the proposed imputation technique produces closer values to the original time series data set, resulting in low error rates compared with other existing imputation methods.

[1]  Vadlamani Ravi,et al.  Data imputation via evolutionary computation, clustering and a neural network , 2015, Neurocomputing.

[2]  Oliver Kramer,et al.  kNN ensembles with penalized DTW for multivariate time series imputation , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[3]  Lena Osterhagen,et al.  Multiple Imputation For Nonresponse In Surveys , 2016 .

[4]  Francisco Herrera,et al.  Data Preprocessing in Data Mining , 2014, Intelligent Systems Reference Library.

[5]  John B Carlin,et al.  American Journal of Epidemiology Practice of Epidemiology Strategies for Multiple Imputation in Longitudinal Studies , 2022 .

[6]  Jos Twisk,et al.  Attrition in longitudinal studies. How to deal with missing data. , 2002, Journal of clinical epidemiology.

[7]  Han Lin Shang,et al.  Forecasting functional time series , 2009 .

[8]  Gualberto Asencio-Cortés,et al.  R Package imputeTestbench to Compare Imputation Methods for Univariate Time Series , 2016, R J..

[9]  Elad Hazan,et al.  Online Time Series Prediction with Missing Data , 2015, ICML.

[10]  G. Mahon A Proposal for Strength-of-Agreement Criteria for Lin’s Concordance Correlation Coefficient , 2005 .

[11]  Thomas Bartz-Beielstein,et al.  imputeTS: Time Series Missing Value Imputation in R , 2017, R J..

[12]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[13]  Haydar Demirhan,et al.  A bagging algorithm for the imputation of missing values in time series , 2019, Expert Syst. Appl..

[14]  Christos Faloutsos,et al.  DynaMMo: mining and summarization of coevolving sequences with missing values , 2009, KDD.

[15]  Panos Liatsis,et al.  A robust missing value imputation method for noisy data , 2010, Applied Intelligence.

[16]  R. Little A Test of Missing Completely at Random for Multivariate Data with Missing Values , 1988 .

[17]  Paula Diehr,et al.  Imputation of missing longitudinal data: a comparison of methods. , 2003, Journal of clinical epidemiology.

[18]  Gerhard Tutz,et al.  Improved methods for the imputation of missing data by nearest neighbor methods , 2015, Comput. Stat. Data Anal..

[19]  Md Zahidul Islam,et al.  FIMUS: A framework for imputing missing values using co-appearance, correlation and similarity analysis , 2014, Knowl. Based Syst..

[20]  Dezhao Chen,et al.  An Enhanced ART2 Neural Network for Clustering Analysis , 2008, First International Workshop on Knowledge Discovery and Data Mining (WKDD 2008).

[21]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[22]  Yan Shi,et al.  A hybrid clustering algorithm based on ART2 and its application in anomaly detection , 2008, 2008 International Conference on Wavelet Analysis and Pattern Recognition.

[23]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[24]  Shin Ishii,et al.  A Bayesian missing value estimation method for gene expression profile data , 2003, Bioinform..

[25]  Bruno Sinopoli,et al.  Kalman filtering with intermittent observations , 2004, IEEE Transactions on Automatic Control.

[26]  R. Billinton,et al.  Time-series models for reliability evaluation of power systems including wind energy , 1996 .

[27]  Thomas Bartz-Beielstein,et al.  Comparison of different Methods for Univariate Time Series Imputation in R , 2015, ArXiv.

[28]  Ryan Hafen,et al.  Enhanced Seasonal Decomposition of Time Series by Loess , 2016 .

[29]  Stephen Grossberg,et al.  Adaptive Resonance Theory , 2010, Encyclopedia of Machine Learning.

[30]  A. Zeileis,et al.  zoo: S3 Infrastructure for Regular and Irregular Time Series , 2005, math/0505527.

[31]  Md Zahidul Islam,et al.  Missing value imputation using decision trees and decision forests by splitting and merging records: Two novel techniques , 2013, Knowl. Based Syst..

[32]  Patrick Royston,et al.  Multiple imputation using chained equations: Issues and guidance for practice , 2011, Statistics in medicine.

[33]  M. Ghil,et al.  Interdecadal oscillations and the warming trend in global temperature time series , 1991, Nature.

[34]  Leslie S. Smith,et al.  A neural network-based framework for the reconstruction of incomplete data sets , 2010, Neurocomputing.

[35]  Lifeng Shen,et al.  End-to-End Time Series Imputation via Residual Short Paths , 2018, ACML.

[36]  Harri Niska,et al.  Methods for imputation of missing values in air quality data sets , 2004 .

[37]  Kishore Kulat,et al.  A novel imputation methodology for time series based on pattern sequence forecasting , 2018, Pattern Recognit. Lett..

[38]  François Clemens,et al.  Interpolation in Time Series : An Introductive Overview of Existing Methods, Their Performance Criteria and Uncertainty Assessment , 2017 .