Background and threshold: critical comparison of methods of determination.

Different procedures to identify data outliers in geochemical data are reviewed and tested. The calculation of [mean+/-2 standard deviation (sdev)] to estimate threshold values dividing background data from anomalies, still used almost 50 years after its introduction, delivers arbitrary estimates. The boxplot, [median+/-2 median absolute deviation (MAD)] and empirical cumulative distribution functions are better suited for assisting in the estimation of threshold values and the range of background data. However, all of these can lead to different estimates of threshold. Graphical inspection of the empirical data distribution using a variety of different tools from exploratory data analysis is thus essential prior to estimating threshold values or defining background. There is no good reason to continue to use the [mean+/-2 sdev] rule, originally proposed as a 'filter' to identify approximately 2(1/2)% of the data at each extreme for further inspection at a time when computers to do the drudgery of numerical operations were not widely available and no other practical methods existed. Graphical inspection using statistical and geographical displays to isolate sets of background data is far better suited for estimating the range of background variation and thresholds, action levels (e.g., maximum admissible concentrations--MAC values) or clean-up goals in environmental legislation.

[1]  J. Eriksson,et al.  Agricultural soils in Northern Europe: a geochemical atlas. , 2003 .

[2]  R. Dutter Developments in robust statistics : International Conference on Robust Statistics 2001 , 2003 .

[3]  Jim Freeman,et al.  Outliers in Statistical Data (3rd edition) , 1995 .

[4]  F. Mosteller,et al.  Understanding robust and exploratory data analysis , 1985 .

[5]  Q. Cheng,et al.  The separation of geochemical anomalies from background by fractal methods , 1994 .

[6]  George S. Koch,et al.  Statistical Analysis of Geological Data , 1981 .

[7]  I Thornton,et al.  Spatially resolved hazard and exposure assessments: an example of lead in soil at Lavrion, Greece. , 2000, Environmental research.

[8]  Werner A. Stahel,et al.  Robust Statistics: The Approach Based on Influence Functions , 1987 .

[9]  A. T. Miesch Estimation of the geochemical threshold and its statistical significance , 1981 .

[10]  Herbert E. Allen,et al.  Bioavailability of metals in terrestrial ecosystems : importance of partitioning for bioavailability to invertebrates, microbes, and plants , 2001 .

[11]  P. Filzmoser,et al.  Normal and lognormal data distribution in geochemistry: death of a myth. Consequences for the statistical treatment of geochemical and environmental data , 2000 .

[12]  N. Gustavsson,et al.  Visualization of geochemical data on maps: New options , 1987 .

[13]  Frederick Mosteller,et al.  Understanding robust and exploratory data analysis , 1983 .

[14]  Hans Kürzl,et al.  Exploratory data analysis: recent advances for the interpretation of geochemical data , 1988 .

[15]  John W. Tukey,et al.  Exploratory Data Analysis. , 1979 .

[16]  Peter M. Chapman,et al.  Issues in Ecological Risk Assessment of Inorganic Metals and Metalloids , 2000 .

[17]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[18]  Q. Cheng Spatial and scaling modelling for geochemical anomaly separation , 1999 .

[19]  Qiuming Cheng,et al.  A spatial analysis method for geochemical anomaly separation , 1996 .

[20]  Terry A. Slocum Thematic Cartography and Visualization , 1998 .

[21]  Colin R. Janssen,et al.  Uncertainties in the Environmental Risk Assessment of Metals , 2000 .

[22]  M. L. White,et al.  Study of the distribution of some geochemical data , 1959 .

[23]  D. Ruppert Robust Statistics: The Approach Based on Influence Functions , 1987 .

[24]  Robert G. Garnett Applications of Probability Graphs in Mineral Exploration , 1977 .

[25]  P. Arlien‐Søborg,et al.  Science of the Total Environment , 2018 .

[26]  A. Sinclair Selection of threshold values in geochemical data using probability graphs , 1974 .

[27]  Vic Barnett,et al.  Outliers in Statistical Data , 1980 .

[28]  B. Ripley,et al.  Robust Statistics , 2018, Wiley Series in Probability and Statistics.