Boxplot-Based Outlier Detection for the Location-Scale Family

Boxplots are among the most widely used exploratory data analysis (EDA) tools in statistical practice. Typical applications of boxplots include eliciting information about the underlying distribution (shape, location, etc.) as well as identifying possible outliers. This article focuses on a modification using a type of lower and upper fences similar in concept to those used in a traditional boxplot; however, instead of constructing the upper and lower fences using the upper and lower quartiles, respectively, and a multiple of the interquartile range (IQR), multiples of the upper and the lower semi-interquartile ranges (SIQR), respectively, measured from the sample median, are used. Any observation beyond the proposed fences is labeled a potential outlier. An exact expression for the probability that at least one sample observation is wrongly classified as an outlier, the so-called “some-outside rate per sample” (Hoaglin et al. (1986)), is derived for the family of location-scale distributions and is used in the determination of the fence constants. Tables for the fence constants are provided for a number of well-known location-scale distributions along with some illustrations with data; the performance of the outlier detection rule is explored in a simulation study.

[1]  Boris Iglewicz,et al.  A SIMPLE UNIVARIATE OUTLIER IDENTIFICATION PROCEDURE , 2001 .

[2]  Subha Chakraborti,et al.  Boxplot‐based phase I control charts for time between events , 2012, Qual. Reliab. Eng. Int..

[3]  Dong-Sheng Cao,et al.  A new strategy of outlier detection for QSAR/QSPR , 2009, J. Comput. Chem..

[4]  Hamid Louni Outlier Detection in Arma Models , 2008 .

[5]  Andrea Cerioli,et al.  Multivariate Outlier Detection With High-Breakdown Estimators , 2010 .

[6]  Shizuhiko Nishisato,et al.  Elements of Dual Scaling: An Introduction To Practical Data Analysis , 1993 .

[7]  Neil C. Schwertman,et al.  Identifying outliers with sequential fences , 2007, Comput. Stat. Data Anal..

[8]  J. Brian Gray,et al.  Introduction to Linear Regression Analysis , 2002, Technometrics.

[9]  Fred Spiring,et al.  Introduction to Statistical Quality Control , 2007, Technometrics.

[10]  Jammalamadaka Introduction to Linear Regression Analysis (3rd ed.) , 2003 .

[11]  J. Tukey,et al.  Performance of Some Resistant Rules for Outlier Labeling , 1986 .

[12]  A. C. Kimber,et al.  Exploratory Data Analysis for Possibly Censored Data from Skewed Distributions , 1990 .

[13]  R. Serfling,et al.  Nonparametric depth-based multivariate outlier identifiers, and masking robustness properties , 2010 .

[14]  Neil C. Schwertman,et al.  A simple more general boxplot method for identifying outliers , 2004, Comput. Stat. Data Anal..

[15]  Kenneth Carling,et al.  Resistant outlier rules and the non-Gaussian case , 1998 .

[16]  W. R. Buckland,et al.  Outliers in Statistical Data , 1979 .

[17]  M. Braga,et al.  Exploratory Data Analysis , 2018, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[18]  Ali S. Hadi,et al.  Detection of outliers , 2009 .

[19]  Vic Barnett,et al.  The Study of Outliers: Purpose and Model , 1978 .

[20]  Subhabrata Chakraborti,et al.  Outlier detection for multivariate skew-normal data: a comparative study , 2013 .

[21]  Boris Iglewicz,et al.  A Simple Univariate Outlier Identification Procedure Designed for Large Samples , 2007, Commun. Stat. Simul. Comput..