Features and performance of some outlier detection methods

A review of several statistical methods that are currently in use for outlier identification is presented, and their performances are compared theoretically for typical statistical distributions of experimental data, considering values derived from the distribution of extreme order statistics as reference terms. A simple modification of a popular, broadly used method based upon box-plot is introduced, in order to overcome a major limitation concerning sample size. Examples are presented concerning exploitation of methods considered on two data sets: a historical one concerning evaluation of an astronomical constant performed by a number of leading observatories and a substantial database pertaining to an ongoing investigation on absolute measurement of gravity acceleration, exhibiting peculiar aspects concerning outliers. Some problems related to outlier treatment are examined, and the requirement of both statistical analysis and expert opinion for proper outlier management is underlined.

[1]  Thomas S. Ferguson,et al.  Discussion of the Papers of Messrs. Anscombe and Daniel , 1960 .

[2]  J. H. Sheesley Tests for Outlying Observations , 1977 .

[3]  Jin Zhang,et al.  Unmasking test for multiple upper or lower outliers in normal samples , 1998 .

[4]  A. Madansky Identification of Outliers , 1988 .

[5]  Giulio Barbato,et al.  A method to estimate the time–position coordinates of a free-falling test-mass in absolute gravimetry , 2005 .

[6]  M. Braga,et al.  Exploratory Data Analysis , 2018, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[7]  Herbert A. David,et al.  Order Statistics , 2011, International Encyclopedia of Statistical Science.

[8]  J. I The Design of Experiments , 1936, Nature.

[9]  Bruno De Finetti,et al.  The Bayesian Approach to the Rejection of Outliers , 1961 .

[10]  Benjamin Peirce,et al.  Criterion for the rejection of doubtful observations , 1852 .

[11]  F. E. Grubbs Sample Criteria for Testing Outlying Observations , 1950 .

[12]  Grazia Vicario,et al.  Approaches to handling discordant observations: an appraisal , 2010 .

[13]  D. Bergel Geigy Scientific Tables , 1991 .

[14]  N. Jaspen Applied Nonparametric Statistics , 1979 .

[15]  Thomas S. Ferguson,et al.  On the Rejection of Outliers , 1961 .

[16]  William Kruskal,et al.  Some Remarks on Wild Observations , 1960 .

[17]  Julia E. Seaman,et al.  Outlier Options: Consider simple parametric tests to find an outlier's significance , 2010 .

[18]  H. Leon Harter,et al.  Order statistics and their use in testing and estimation , 1970 .

[19]  Franco Pavese,et al.  SAODR: sequence analysis for outlier data rejection , 2004 .

[20]  B. P. Murphy,et al.  Handbook of Methods of Applied Statistics , 1968 .

[21]  Philip H. Ramsey Nonparametric Statistical Methods , 1974, Technometrics.

[22]  Daron Acemoglu,et al.  Discussion Papers , 2007 .

[23]  F. E. Grubbs Procedures for Detecting Outlying Observations in Samples , 1969 .

[24]  Shaun Burke,et al.  Missing Values , Outliers , Robust Statistics & Non-parametric Methods , 2022 .

[25]  B. Zimmerman,et al.  KEY COMPARISON: Report of the CIPM Key Comparison CCRI(II)-K2.Y-90 , 2005 .

[26]  W. J. Dixon,et al.  Analysis of Extreme Values , 1950 .

[27]  Volker Dose Bayesian estimate of the Newtonian constant of gravitation , 2007 .

[28]  Vic Barnett,et al.  The Study of Outliers: Purpose and Model , 1978 .

[29]  Francesca Pennecchi,et al.  Reconstruction of the free-falling body trajectory in a rise-and-fall absolute ballistic gravimeter , 2008 .

[30]  Anthony C. Davison,et al.  Statistics of Extremes , 2015, International Encyclopedia of Statistical Science.

[31]  William Chauvenet,et al.  A manual of spherical and practical astronomy , 1891 .

[32]  B. Rosner Percentage Points for a Generalized ESD Many-Outlier Procedure , 1983 .

[33]  B. A. Gould On Peirce's Criterion for the Rejection of Doubtful Observations, with tables for facilitating its application , 1855 .

[34]  Johanna Smeyers-Verbeke,et al.  Visual presentation of data by means of box plots , 2005 .

[35]  G. A. Werdmuller,et al.  Precision of test methods : determination of repeatability and reproducibility by inter-laboratory tests : application in development and assessment of dairy methodology , 1983 .

[36]  H. J. Arnold Introduction to the Practice of Statistics , 1990 .

[37]  Chang-Tien Lu,et al.  Outlier Detection , 2008, Encyclopedia of GIS.

[38]  Thomas S. Ferguson,et al.  Rules for Rejection of Outliers , 1961 .

[39]  W. Stefansky Rejecting Outliers in Factorial Designs , 1972 .

[40]  W. R. Buckland,et al.  Outliers in Statistical Data , 1979 .

[41]  David Lindley,et al.  Introduction to the Practice of Statistics , 1990, The Mathematical Gazette.

[42]  W. J. Dixon,et al.  Ratios Involving Extreme Values , 1951 .

[43]  Patrick Guillaume,et al.  On-line robust processing techniques for elimination of measurement drop-out , 2002 .

[44]  M. Abramowitz,et al.  Handbook of Mathematical Functions With Formulas, Graphs and Mathematical Tables (National Bureau of Standards Applied Mathematics Series No. 55) , 1965 .

[45]  Minge Xie,et al.  Bootlier-plot: Bootstrap based outlier detection plot , 2003 .

[46]  J. Teugels,et al.  Statistics of Extremes , 2004 .