On-line outlier detection and data cleaning

Outliers are observations that do not follow the statistical distribution of the bulk of the data, and consequently may lead to erroneous results with respect to statistical analysis. Many conventional outlier detection tools are based on the assumption that the data is identically and independently distributed. In this paper, an outlier-resistant data filter-cleaner is proposed. The proposed data filter-cleaner includes an on-li ne outlier-resistant estimate of the process model and combines it with a modified Kalman filter to detect and “clean” outliers. The advantage over existing methods is that the proposed method has the following features: (a) a priori knowledge of the process model is not required; (b) it is applicable to autocorrelated data; (c) it can be implemented on-line; and (d) it tries to only clean (i.e., detects and replaces) outliers and preserves all other information in the data. © 2004 Elsevier Ltd. All rights reserved.

[1]  Layth C. Alwan,et al.  Time-Series Modeling for Statistical Process Control , 1988 .

[2]  R. Maronna Robust $M$-Estimators of Multivariate Location and Scatter , 1976 .

[3]  P. Rousseeuw Least Median of Squares Regression , 1984 .

[4]  D. Thomson,et al.  Robust-resistant spectrum estimation , 1982, Proceedings of the IEEE.

[5]  R. Martin Robust Estimation for Time Series Autoregressions , 1979 .

[6]  B. Bakshi,et al.  On-line multiscale filtering of random and gross errors without process models , 1999 .

[7]  M. Otto,et al.  Outliers in Time Series , 1972 .

[8]  Shizuhiko Nishisato,et al.  Elements of Dual Scaling: An Introduction To Practical Data Analysis , 1993 .

[9]  Ronald K. Pearson,et al.  Outliers in process modeling and identification , 2002, IEEE Trans. Control. Syst. Technol..

[10]  L. Biegler,et al.  Data reconciliation and gross‐error detection for dynamic systems , 1996 .

[11]  Lon-Mu Liu,et al.  Forecasting time series with outliers , 1993 .

[12]  Lon-Mu Liu,et al.  Joint Estimation of Model Parameters and Outlier Effects in Time Series , 1993 .

[13]  R. Tsay Time Series Model Specification in the Presence of Outliers , 1986 .

[14]  Ana Bianco,et al.  Robust Procedures for Regression Models with ARIMA Errors , 1996 .

[15]  G. C. Tiao,et al.  Estimation of time series parameters in the presence of outliers , 1988 .

[16]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[17]  Katrien van Driessen,et al.  A Fast Algorithm for the Minimum Covariance Determinant Estimator , 1999, Technometrics.

[18]  J. P. Park The Identification Of Multiple Outliers , 2000 .

[19]  W. R. Buckland,et al.  Outliers in Statistical Data , 1979 .

[20]  Ana Bianco,et al.  Outlier Detection in Regression Models with ARIMA Errors Using Robust Estimates , 2001 .

[21]  F. Hampel The Influence Curve and Its Role in Robust Estimation , 1974 .

[22]  Shien-Ming Wu,et al.  Time series and system analysis with applications , 1983 .

[23]  F. Hampel A General Qualitative Definition of Robustness , 1971 .

[24]  W. J. Conover,et al.  Some Reasons for Not Using the Yates Continuity Correction on 2 × 2 Contingency Tables: Rejoinder , 1974 .

[25]  L. Denby,et al.  Robust Estimation of the First-Order Autoregressive Parameter , 1979 .

[26]  R. Tsay Outliers, Level Shifts, and Variance Changes in Time Series , 1988 .

[27]  R. Martin,et al.  Robust bayesian estimation for the linear model and robustifying the Kalman filter , 1977 .

[28]  B. Ripley,et al.  Robust Statistics , 2018, Wiley Series in Probability and Statistics.