Dynamic time warping-based imputation for univariate time series data

Time series with missing values occur in almost any domain of applied sciences. Ignoring missing values can lead to a loss of efficiency and unreliable results, especially for large missing sub-sequence(s). This paper proposes an approach to fill in large gap(s) within time series data under the assumption of effective information. To obtain the imputation of missing values, we find the most similar sub-sequence to the sub-sequence before (resp. after) the missing values, then complete the gap by the next (resp. previous) sub-sequence of the most similar one. Dynamic Time Warping algorithm is applied to compare sub-sequences, and combined with the shape-feature extraction algorithm for reducing insignificant solutions. Eight well-known and real-world data sets are used for evaluating the performance of the proposed approach in comparison with five other methods on different indicators. The obtained results proved that the performance of our approach is the most robust one in case of time series data having high auto-correlation and cross-correlation, strong seasonality, large gap(s), and complex distribution.

[1]  L. Glass,et al.  Oscillation and chaos in physiological control systems. , 1977, Science.

[2]  Siva Subramanian,et al.  Reducing psychosocial and behavioral pregnancy risk factors: results of a randomized clinical trial among high-risk pregnant african american women. , 2009, American journal of public health.

[3]  Alain Lefebvre,et al.  Monitoring system of phytoplankton blooms by using unsupervised classifier and time modeling , 2013, 2013 IEEE International Geoscience and Remote Sensing Symposium - IGARSS.

[4]  John B Carlin,et al.  American Journal of Epidemiology Practice of Epidemiology Strategies for Multiple Imputation in Longitudinal Studies , 2022 .

[5]  John Van Hoewyk,et al.  A multivariate technique for multiply imputing missing values using a sequence of regression models , 2001 .

[6]  Shah Atiqur Rahman,et al.  Combining Fourier and lagged k-nearest neighbor imputation for biomedical time series data , 2015, J. Biomed. Informatics.

[7]  J. Kihoro,et al.  Imputation of incomplete non- stationary seasonal time series data , 2013 .

[8]  J. Carpenter,et al.  Practice of Epidemiology Comparison of Random Forest and Parametric Imputation Models for Imputing Missing Data Using MICE: A CALIBER Study , 2014 .

[9]  John B Carlin,et al.  Multiple imputation for missing data: fully conditional specification versus multivariate normal imputation. , 2010, American journal of epidemiology.

[10]  Yan Lin,et al.  Missing value imputation in high-dimensional phenomic data: imputable or not, and how? , 2014, BMC Bioinformatics.

[11]  Ben Goodrich,et al.  Missing Data Imputation and Model Checking , 2015 .

[12]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[13]  David S. Siscovick,et al.  A multiple-imputation analysis of a case-control study of the risk of primary cardiac arrest among pharmacologicallytreated hypertensives , 1996 .

[14]  Patrick Royston,et al.  Multiple Imputation of Missing Values: Further Update of Ice, with an Emphasis on Interval Censoring , 2007 .

[15]  H. Boshuizen,et al.  Multiple imputation of missing blood pressure covariates in survival analysis. , 1999, Statistics in medicine.

[16]  Yi Deng,et al.  Multiple Imputation for General Missing Data Patterns in the Presence of High-dimensional Data , 2016, Scientific Reports.

[17]  M. P. Gómez-Carracedo,et al.  A practical comparison of single and multiple imputation methods to handle complex missing data in air quality datasets , 2014 .

[18]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[19]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[20]  Thomas Bartz-Beielstein,et al.  Comparison of different Methods for Univariate Time Series Imputation in R , 2015, ArXiv.

[21]  Joseph L Schafer,et al.  Analysis of Incomplete Multivariate Data , 1997 .

[22]  S. Crawford,et al.  A comparison of anlaytic methods for non-random missingness of outcome data. , 1995, Journal of clinical epidemiology.

[23]  A. Zeileis,et al.  zoo: S3 Infrastructure for Regular and Irregular Time Series , 2005, math/0505527.

[24]  M. Kenward,et al.  Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls , 2009, BMJ : British Medical Journal.

[25]  Nor Azam Ramli,et al.  Comparison of Linear Interpolation Method and Mean Method to Replace the Missing Values in Environmental Data Set , 2014 .

[26]  Elizabeth A Stuart,et al.  Multiple imputation with large data sets: a case study of the Children's Mental Health Initiative. , 2009, American journal of epidemiology.

[27]  Chidchanok Lursinsap,et al.  Imputing incomplete time-series data based on varied-window similarity measure of data sequences , 2007, Pattern Recognit. Lett..

[28]  Graeme Hawthorne,et al.  Imputing Cross-Sectional Missing Data: Comparison of Common Techniques , 2005 .

[29]  Harri Niska,et al.  Methods for imputation of missing values in air quality data sets , 2004 .

[30]  Eamonn J. Keogh,et al.  Derivative Dynamic Time Warping , 2001, SDM.

[31]  André Bigand,et al.  Comparative study on supervised learning methods for identifying phytoplankton species , 2016, 2016 IEEE Sixth International Conference on Communications and Electronics (ICCE).

[32]  Hae-Jin Kim,et al.  Discovery of and Recovery from Failure in a Costal Marine USN Service , 2012, J. Inform. and Commun. Convergence Engineering.

[33]  P. Tans,et al.  Atmospheric carbon dioxide at Mauna Loa Observatory: 2. Analysis of the NOAA GMCC data, 1974–1985 , 1989 .