A simple more general boxplot method for identifying outliers

Abstract The boxplot method (Exploratory Data Analysis, Addison-Wesley, Reading, MA, 1977) is a graphically-based method of identifying outliers which is appealing not only in its simplicity but also because it does not use the extreme potential outliers in computing a measure of dispersion. The inner and outer fences are defined in terms of the hinges (or fourths), and therefore are not distorted by a few extreme values. Such distortion could lead to failing to detect some outliers, a problem known as “masking”. A method for determining the probability associated with any fence or observation is proposed based on the cumulative distribution function of the order statistics. This allows the statistician to easily assess, in a probability sense, the degree to which an observation is dissimilar to the majority of the observations. In addition, an adaptation for approximately normal but somewhat asymmetric distributions is suggested.

[1]  D. Hoaglin,et al.  Fine-Tuning Some Resistant Rules for Outlier Labeling , 1987 .

[2]  D. F. Andrews,et al.  A Robust Method for Multiple Linear Regression , 1974 .

[3]  F. David,et al.  Statistical Estimates and Transformed Beta-Variables. , 1960 .

[4]  Jim Freeman,et al.  Outliers in Statistical Data (3rd edition) , 1995 .

[5]  J. E. Freund,et al.  A New Look at Quartiles of Ungrouped Data , 1987 .

[6]  Bhandary Madhusudan,et al.  Detection of the numbers of outliers present in a data set using an information theoretic criterion , 1992 .

[7]  Madhusudan Bhandary Detection of the numbers of outliers present in a data set using an information theoretic criterion , 1992 .

[8]  Frank E. Harrell,et al.  A new distribution-free quantile estimator , 1982 .

[9]  J. Tukey,et al.  Performance of Some Resistant Rules for Outlier Labeling , 1986 .

[10]  K. Penny Appropriate Critical Values When Testing for a Single Multivariate Outlier by Using the Mahalanobis Distance , 1996 .

[11]  R. Brant,et al.  Comparing Classical and Resistant Outlier Rules , 1990 .

[12]  Wing K. Fung,et al.  A New Graphical Method for Detecting Single and Multiple Outliers in Univariate and Multivariate Data , 1987 .

[13]  Kenneth Carling,et al.  Resistant outlier rules and the non-Gaussian case , 1998 .

[14]  Bell Telephone,et al.  ROBUST ESTIMATES, RESIDUALS, AND OUTLIER DETECTION WITH MULTIRESPONSE DATA , 1972 .

[15]  Vic Barnett,et al.  The Study of Outliers: Purpose and Model , 1978 .

[16]  M. Braga,et al.  Exploratory Data Analysis , 2018, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[17]  Issei Fujishiro,et al.  The elements of graphing data , 2005, The Visual Computer.

[18]  J. S. Milton,et al.  Statistical Methods in the Biological and Health Sciences , 1982 .

[19]  David C. Hoaglin,et al.  Some Implementations of the Boxplot , 1989 .

[20]  H. Harter Expected values of normal order statistics , 1961 .

[21]  Vic Barnett,et al.  Outliers in Statistical Data , 1980 .

[22]  Gunnar Blom,et al.  Statistical Estimates and Transformed Beta-Variables. , 1960 .

[23]  A. C. Kimber,et al.  Exploratory Data Analysis for Possibly Censored Data from Skewed Distributions , 1990 .

[24]  Douglas M. Hawkins Identification of Outliers , 1980, Monographs on Applied Probability and Statistics.

[25]  D. F. Andrews,et al.  Finding the Outliers that Matter , 1978 .

[26]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[27]  A. Atkinson Fast Very Robust Methods for the Detection of Multiple Outliers , 1994 .

[28]  F. Hampel The Breakdown Points of the Mean Combined With Some Rejection Rules , 1985 .

[29]  Richard F. Gunst,et al.  Applied Regression Analysis , 1999, Technometrics.

[30]  Michael Stuart,et al.  Understanding Robust and Exploratory Data Analysis , 1984 .

[31]  Rob J Hyndman,et al.  Sample Quantiles in Statistical Packages , 1996 .