General Approaches to Stepwise Identification of Unusual Values in Data Analysis

One of the general goals in data analysis is the identification of unusual values. This can be done indirectly (after performing a robust analysis) or directly (via some detection procedure). This paper summarizes the backwards-stepping approach to the detection of unusual values in a data set. This approach has the advantages of simplicity of application, flexibility, and resistance to masking effects. Application to univariate, multivariate, and regression data, as well as other problems, is discussed. Simulations are used to investigate the properties of this strategy for data analysis. It is shown that identification of unusual values using appropriate detection procedures can be considerably more effective than indirect detection using a robust analysis.

[1]  F. J. Anscombe,et al.  Rejection of Outliers , 1960 .

[2]  S. Weisberg,et al.  Applied Linear Regression (2nd ed.). , 1986 .

[3]  D. Lax Robust Estimators of Scale: Finite-Sample Performance in Long-Tailed Symmetric Distributions , 1985 .

[4]  F. James Rohlf,et al.  Generalization of the Gap Test for the Detection of Multivariate Outliers , 1975 .

[5]  B. Efron Bootstrap Methods: Another Look at the Jackknife , 1979 .

[6]  J. Simonoff Outlier detection and robust estimation of scale , 1987 .

[7]  Ramanathan Gnanadesikan,et al.  Methods for statistical data analysis of multivariate observations , 1977, A Wiley publication in applied statistics.

[8]  P. J. Huber Robust Regression: Asymptotics, Conjectures and Monte Carlo , 1973 .

[9]  S. Weisberg,et al.  Residuals and Influence in Regression , 1982 .

[10]  P. Rousseeuw Least Median of Squares Regression , 1984 .

[11]  Ram B. Jain,et al.  Detecting outliers: power and some other considerations , 1981 .

[12]  J. Simonoff The breakdown and influence properties of outlier rejection-plus-mean procedures , 1987 .

[13]  Roger K. Blashfield,et al.  Mixture model tests of cluster analysis: Accuracy of four agglomerative hierarchical methods. , 1976 .

[14]  D. F. Andrews,et al.  Robust Estimates of Location: Survey and Advances. , 1975 .

[15]  B. Rosner Percentage Points for a Generalized ESD Many-Outlier Procedure , 1983 .

[16]  W. Dixon,et al.  BMDP statistical software , 1983 .

[17]  The calculation of outlier detection statistics , 1984 .

[18]  Robert F. Ling,et al.  K-Clustering as a Detection Tool for Influential Subsets in Regression , 1984 .

[19]  Peter J. Rousseeuw,et al.  Unmasking Multivariate Outliers and Leverage Points: Rejoinder , 1990 .

[20]  D. F. Andrews,et al.  Finding the Outliers that Matter , 1978 .

[21]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[22]  Peter J. Bickel,et al.  S: An Interactive Environment for Data Analysis and Graphics , 1984 .

[23]  R. Jain Percentage Points of Many-Outlier Detection Procedures , 1981 .

[24]  P. Rousseeuw,et al.  Unmasking Multivariate Outliers and Leverage Points , 1990 .

[25]  J. Tukey,et al.  Performance of Some Resistant Rules for Outlier Labeling , 1986 .

[26]  Jeffrey S. Simonoff,et al.  A comparison of robust methods and detection of outliers techniques when estimating a location parameter , 1984 .

[27]  Douglas M. Hawkins Identification of Outliers , 1980, Monographs on Applied Probability and Statistics.

[28]  Douglas M. Hawkins,et al.  Fractiles of an extended multiple outlier test , 1979 .

[29]  Jeffrey S. Simonoff,et al.  Detecting outlying cells in two-way contingency table via backwards-stepping , 1988 .

[30]  Bernard Rosner,et al.  On the Detection of Many Outliers , 1975 .

[31]  J. Hartigan Consistency of Single Linkage for High-Density Clusters , 1981 .

[32]  Steven J. Schwager,et al.  Detection of Multivariate Normal Outliers , 1982 .

[33]  Richard H. Jones,et al.  Maximum Likelihood Fitting of ARMA Models to Time Series With Missing Observations , 1980 .

[34]  R. Welsch,et al.  Efficient Bounded-Influence Regression Estimation , 1982 .