Time Series Data Cleaning: From Anomaly Detection to Anomaly Repairing

Errors are prevalent in time series data, such as GPS trajectories or sensor readings. Existing methods focus more on anomaly detection but not on repairing the detected anomalies. By simply filtering out the dirty data via anomaly detection, applications could still be unreliable over the incomplete time series. Instead of simply discarding anomalies, we propose to (iteratively) repair them in time series data, by creatively bonding the beauty of temporal nature in anomaly detection with the widely considered minimum change principle in data repairing. Our major contributions include: (1) a novel framework of iterative minimum repairing (IMR) over time series data, (2) explicit analysis on convergence of the proposed iterative minimum repairing, and (3) efficient estimation of parameters in each iteration. Remarkably, with incremental computation, we reduce the complexity of parameter estimation from O(n) to O(1). Experiments on real datasets demonstrate the superiority of our proposal compared to the state-of-the-art approaches. In particular, we show that (the proposed) repairing indeed improves the time series classification application.

[1]  Yu Zheng,et al.  U-Air: when urban air quality inference meets big data , 2013, KDD.

[2]  Philip S. Yu,et al.  Early classification on time series , 2012, Knowledge and Information Systems.

[3]  王建民 Shaoxu Song, Aoqian Zhang, Jianmin Wang, Philip S. Yu. SCREEN: Stream Data Cleaning under Speed Constraints. ACM SIGMOD International Conference on Management of Data , 2015 .

[4]  Philip S. Yu,et al.  SCREEN: Stream Data Cleaning under Speed Constraints , 2015, SIGMOD Conference.

[5]  Richard A. Davis,et al.  Introduction to time series and forecasting , 1998 .

[6]  Calyampudi R. Rao,et al.  Linear Statistical Inference and Its Applications. , 1975 .

[7]  Charles R. Farrar,et al.  An Outlier Analysis Framework for Impedance-based Structural Health Monitoring , 2005 .

[8]  Norbert Wiener,et al.  Extrapolation, Interpolation, and Smoothing of Stationary Time Series , 1964 .

[9]  R. Tsay Outliers, Level Shifts, and Variance Changes in Time Series , 1988 .

[10]  Fabian Mörchen,et al.  Algorithms for time series knowledge mining , 2006, KDD '06.

[11]  Paolo Papotti,et al.  Holistic data cleaning: Putting violations into context , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[12]  N. L. Johnson,et al.  Linear Statistical Inference and Its Applications , 1966 .

[13]  David J. Hill,et al.  Anomaly detection in streaming environmental sensor data: A data-driven modeling approach , 2010, Environ. Model. Softw..

[14]  Bing Cheng Yule–Walker Equations , 2014 .

[15]  E. S. Gardner EXPONENTIAL SMOOTHING: THE STATE OF THE ART, PART II , 2006 .

[16]  D. Brillinger Time series - data analysis and theory , 1981, Classics in applied mathematics.

[17]  Joseph M. Hellerstein,et al.  Quantitative Data Cleaning for Large Databases , 2008 .

[18]  Charu C. Aggarwal,et al.  Outlier Detection for Temporal Data , 2014, Outlier Detection for Temporal Data.

[19]  Renée J. Miller,et al.  Continuous data cleaning , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[20]  Chunping Li,et al.  Turn Waste into Wealth: On Simultaneous Clustering and Cleaning over Dirty Data , 2015, KDD.

[21]  Minos N. Garofalakis,et al.  Adaptive cleaning for RFID data streams , 2006, VLDB.

[22]  George E. P. Box,et al.  Intervention Analysis with Applications to Economic and Environmental Problems , 1975 .

[23]  Divesh Srivastava,et al.  Truth Finding on the Deep Web: Is the Problem Solved? , 2012, Proc. VLDB Endow..

[24]  Gwilym M. Jenkins,et al.  Time series analysis, forecasting and control , 1971 .

[25]  Rajeev Rastogi,et al.  A cost-based model and effective heuristic for repairing constraints by value modification , 2005, SIGMOD '05.

[26]  Jianzhong Li,et al.  Towards certain fixes with editing rules and master data , 2010, The VLDB Journal.

[27]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[28]  J. Manthorpe Land Registration and Land Valuation in the United Kingdom and in the Countries of the United Nations Economic Commission for Europe (UNECE) , 1998 .

[29]  P. Young,et al.  Time series analysis, forecasting and control , 1972, IEEE Transactions on Automatic Control.

[30]  Kenji Yamanishi,et al.  A unifying framework for detecting outliers and change points from non-stationary time series data , 2002, KDD.