Multivariate Spatial Outlier Detection Using Robust Geographically Weighted Methods

Outlier detection is often a key task in a statistical analysis and helps guard against poor decision-making based on results that have been influenced by anomalous observations. For multivariate data sets, large Mahalanobis distances in raw data space or large Mahalanobis distances in principal components analysis, transformed data space, are routinely used to detect outliers. Detection in principal components analysis space can also utilise goodness of fit distances. For spatial applications, however, these global forms can only detect outliers in a non-spatial manner. This can result in false positive detections, such as when an observation’s spatial neighbours are similar, or false negative detections such as when its spatial neighbours are dissimilar. To avoid mis-classifications, we demonstrate that a local adaptation of various global methods can be used to detect multivariate spatial outliers. In particular, we account for local spatial effects via the use of geographically weighted data with either Mahalanobis distances or principal components analysis. Detection performance is assessed using simulated data as well as freshwater chemistry data collected over all of Great Britain. Results clearly show value in both geographically weighted methods to outlier detection.

[1]  Alexandre Boucher,et al.  Multivariate Block-Support Simulation of the Yandi Iron Ore Deposit, Western Australia , 2012, Mathematical Geosciences.

[2]  A. Stewart Fotheringham,et al.  Geographically Weighted Regression: A Method for Exploring Spatial Nonstationarity , 2010 .

[3]  R. M. Lark,et al.  Robust estimation of the pseudo cross‐variogram for cokriging soil properties , 2002 .

[4]  D. Krige,et al.  Studies of the effects of outliers and data transformation on variogram estimates for a base metal and a gold ore body , 1982 .

[5]  Peter Filzmoser,et al.  Robust tools for the imperfect world , 2013, Inf. Sci..

[6]  M. Banerjee,et al.  Beyond kappa: A review of interrater agreement measures , 1999 .

[7]  P. Rousseeuw,et al.  The Bagplot: A Bivariate Boxplot , 1999 .

[8]  Douglas M. Hawkins,et al.  Robust kriging—A proposal , 1984 .

[9]  D. Dudley Williams,et al.  Data transformation and standardization in the multivariate analysis of river water quality , 1999 .

[10]  Timothy C. Haas MULTIVARIATE SPATIAL PREDICTION IN THE PRESENCE OF NON‐LINEAR TREND AND COVARIANCE NON‐STATIONARITY , 1996 .

[11]  D. Wheeler Diagnostic Tools and a Remedial Method for Collinearity in Geographically Weighted Regression , 2007 .

[12]  Peter Filzmoser,et al.  Exploring incomplete data using visualization techniques , 2012, Adv. Data Anal. Classif..

[13]  Mia Hubert,et al.  An adjusted boxplot for skewed distributions , 2008, Comput. Stat. Data Anal..

[14]  Martin Charlton,et al.  Geographically weighted discriminant analysis , 2007 .

[15]  Tomoki Nakaya,et al.  Geographically weighted Poisson regression , 2004 .

[16]  P. Rousseeuw Multivariate estimation with high breakdown point , 1985 .

[17]  David B. Skillicorn,et al.  Proceedings of the 2006 SIAM International Conference on Data Mining , 2006 .

[18]  H. Wackernagle,et al.  Multivariate geostatistics: an introduction with applications , 1998 .

[19]  A. Stewart Fotheringham,et al.  Links, comparisons and extensions of the geographically weighted regression model when used as a spatial predictor , 2011 .

[20]  Changlin Mei,et al.  Local least absolute deviation estimation of spatially varying coefficient models: robust geographically weighted regression approaches , 2011, Int. J. Geogr. Inf. Sci..

[21]  Martin Charlton,et al.  Moving window kriging with geographically weighted variograms , 2010 .

[22]  Morton E. O'Kelly,et al.  Detecting outliers in irregularly distributed spatial data sets by locally adaptive and robust statistical analysis and GIS , 2001, Int. J. Geogr. Inf. Sci..

[23]  S. Fotheringham,et al.  Geographically weighted summary statistics — aframework for localised exploratory data analysis , 2002 .

[24]  A. Stewart Fotheringham,et al.  Robust Geographically Weighted Regression: A Technique for Quantifying Spatial Relationships Between Freshwater Acidification Critical Loads and Catchment Attributes , 2010 .

[25]  Werner G. Müller,et al.  Residual diagnostics for variogram fitting , 2004, Comput. Geosci..

[26]  Chang-Tien Lu,et al.  Spatial Weighted Outlier Detection , 2006, SDM.

[27]  Martin Charlton,et al.  The Use of Geographically Weighted Regression for Spatial Prediction: An Evaluation of Models Using Simulated Data Sets , 2010 .

[28]  Clayton V. Deutsch,et al.  Non-stationary Geostatistical Modeling Based on Distance Weighted Statistics and Distributions , 2012, Mathematical Geosciences.

[29]  L. Anselin Local Indicators of Spatial Association—LISA , 2010 .

[30]  Richard A. Johnson,et al.  A new family of power transformations to improve normality or symmetry , 2000 .

[31]  Andre G. Journel,et al.  Geostatistics: Models and tools for the earth sciences , 1986 .

[32]  Y. Heyden,et al.  Robust statistics in data analysis — A review: Basic concepts , 2007 .

[33]  Edzer J. Pebesma,et al.  Multivariable geostatistics in S: the gstat package , 2004, Comput. Geosci..

[34]  Chang-Tien Lu,et al.  Multivariate Spatial Outlier Detection , 2004, Int. J. Artif. Intell. Tools.

[35]  Martin Charlton,et al.  Geographically weighted principal components analysis , 2011, Int. J. Geogr. Inf. Sci..

[36]  Clemens Reimann,et al.  Multivariate outlier detection in exploration geochemistry , 2005, Comput. Geosci..

[37]  Mike Baxter,et al.  Standardization and Transformation in Principal Component Analysis, with Applications to Archaeometry , 1995 .

[38]  Marc G. Genton,et al.  Adjusted functional boxplots for spatio‐temporal data visualization and outlier detection , 2012 .

[39]  Mia Hubert,et al.  Robustness and Outlier Detection in Chemometrics , 2006 .

[40]  Neil L. Rose,et al.  Critical loads and acid deposition for UK freshwaters. Interim report to the DoE from the Critical Loads Advisory Group (CLAG) Freshwaters sub-group. , 1992 .

[41]  Hans Wackernagel,et al.  Multivariate Geostatistics: An Introduction with Applications , 1996 .

[42]  R. Dimitrakopoulos,et al.  Geostatistical Simulation of Regionalized Pore-Size Distributions Using Min/Max Autocorrelation Factors , 2000 .

[43]  L. Anselin Local Indicators of Spatial Association—LISA , 2010 .

[44]  G. Ljung,et al.  On Outlier Detection in Time Series , 1993 .

[45]  R. J. Howarth,et al.  Application of a generalized power transformation to geochemical data , 1979 .

[46]  Jiangshe Zhang,et al.  The Use of Geographically Weighted Regression for the Relationship among Extreme Climate Indices in China , 2012 .

[47]  Chris Brunsdon,et al.  Geographically Weighted Regression: The Analysis of Spatially Varying Relationships , 2002 .

[48]  Clayton V. Deutsch,et al.  GSLIB: Geostatistical Software Library and User's Guide , 1993 .

[49]  Mia Hubert,et al.  ROBPCA: A New Approach to Robust Principal Component Analysis , 2005, Technometrics.

[50]  Peter Filzmoser,et al.  Introduction to Multivariate Statistical Analysis in Chemometrics , 2009 .

[51]  Paul Harris,et al.  Estimating Freshwater Acidification Critical Load Exceedance Data for Great Britain Using Space-Varying Relationship Models , 2011 .

[52]  Urska Demsar,et al.  Using geovisual analytics to compare the performance of geographically weighted discriminant analysis versus its global counterpart, linear discriminant analysis , 2013, Int. J. Geogr. Inf. Sci..

[53]  Peter Filzmoser,et al.  Outlier identification in high dimensions , 2008, Comput. Stat. Data Anal..

[54]  Jason Dykes,et al.  Geographically Weighted Visualization: Interactive Graphics for Scale-Varying Exploratory Analysis , 2007, IEEE Transactions on Visualization and Computer Graphics.

[55]  Pierre Goovaerts,et al.  Geostatistical modelling of uncertainty in soil science , 2001 .

[56]  A S Fotheringham,et al.  Geographically weighted Poisson regression for disease association mapping , 2005, Statistics in medicine.

[57]  Chang-Tien Lu,et al.  On Detecting Spatial Outliers , 2008, GeoInformatica.

[58]  Georg Ch. Pflug,et al.  Mathematical statistics and applications , 1985 .

[59]  P. Rousseeuw,et al.  Alternatives to the Median Absolute Deviation , 1993 .