Outlier detection by means of robust regression estimators for use in engineering science

This study compares the ability of different robust regression estimators to detect and classify outliers. Well-known estimators with high breakdown points were compared using simulated data. Mean success rates (MSR) were computed and used as comparison criteria. The results showed that the least median of squares (LMS) and least trimmed squares (LTS) were the most successful methods for data that included leverage points, masking and swamping effects or critical and concentrated outliers. We recommend using LMS and LTS as diagnostic tools to classify outliers, because they remain robust even when applied to models that are heavily contaminated or that have a complicated structure of outliers.

[1]  Peter J. Huber,et al.  Robust Statistics , 2005, Wiley Series in Probability and Statistics.

[2]  F. Hampel Contributions to the theory of robust estimation , 1968 .

[3]  F. Hampel A General Qualitative Definition of Robustness , 1971 .

[4]  P. Rousseeuw Multivariate estimation with high breakdown point , 1985 .

[5]  Peter J. Rousseeuw,et al.  Robust Regression and Outlier Detection , 2005, Wiley Series in Probability and Statistics.

[6]  Werner A. Stahel,et al.  Robust Statistics: The Approach Based on Influence Functions , 1987 .

[7]  P. L. Davies,et al.  Breakdown and groups , 2005, math/0508497.

[8]  J. W. Gorman,et al.  Fitting Equations to Data. , 1973 .

[9]  H. Theil A Rank-Invariant Method of Linear and Polynomial Regression Analysis , 1992 .

[10]  Behrooz Kamgar-Parsi,et al.  A Nonparametric Method for Fitting a Straight Line to a Noisy Image , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[12]  P. L. Davies Aspects of Robust Linear Regression , 1993 .

[13]  P. Sen Estimates of the Regression Coefficient Based on Kendall's Tau , 1968 .

[14]  A. Siegel Robust regression using repeated medians , 1982 .

[15]  Ursula Gather,et al.  A note on Tyler's modification of the MAD for the Stahel-Donoho estimator , 1997 .

[16]  Shizuhiko Nishisato,et al.  Elements of Dual Scaling: An Introduction To Practical Data Analysis , 1993 .

[17]  Vic Barnett,et al.  Outliers in Statistical Data , 1980 .

[18]  Georg Ch. Pflug,et al.  Mathematical statistics and applications , 1985 .

[19]  S. Hekimoglu,et al.  Effect of heteroscedasticity and heterogeneousness on outlier detection for geodetic networks , 2007 .

[20]  J. Simonoff,et al.  Procedures for the Identification of Multiple Outliers in Linear Models , 1993 .

[21]  S. Sheather,et al.  Robust Estimation & Testing: Staudte/Robust , 1990 .

[22]  Arnold J. Stromberg,et al.  Computing the Exact Least Median of Squares Estimate and Stability Diagnostics in Multiple Linear Regression , 1993, SIAM J. Sci. Comput..

[23]  S. Sheather,et al.  Robust Estimation and Testing , 1990 .

[24]  Serif Hekimoglu,et al.  Finite Sample Breakdown Points of Outlier Detection Procedures , 1997 .

[25]  P. Rousseeuw Least Median of Squares Regression , 1984 .

[26]  G. Shevlyakov,et al.  Robustness in Data Analysis: Criteria and Methods , 2001 .

[27]  P. Rousseeuw,et al.  Breakdown Points of Affine Equivariant Estimators of Multivariate Location and Covariance Matrices , 1991 .