A hierarchical clustering method for multivariate geostatistical data

Abstract Multivariate geostatistical data have become omnipresent in the geosciences and pose substantial analysis challenges. One of them is the grouping of data locations into spatially contiguous clusters so that data locations within the same cluster are more similar while clusters are different from each other. Spatially contiguous clusters can significantly improve the interpretation that turns the resulting clusters into meaningful geographical subregions. In this paper, we develop an agglomerative hierarchical clustering approach that takes into account the spatial dependency between observations. It relies on a dissimilarity matrix built from a non-parametric kernel estimator of the multivariate spatial dependence structure of data. It integrates existing methods to find the optimal number of clusters and to evaluate the contribution of variables to the clustering. The capability of the proposed approach to provide spatially compact, connected and meaningful clusters is assessed using multivariate synthetic and real datasets. The proposed clustering method gives satisfactory results compared to other similar geostatistical clustering methods.

[1]  P. Monestiez,et al.  Geostatistical Segmentation of Rainfall Data , 1999 .

[2]  Joel Michelin,et al.  Inference of a hidden spatial tessellation from multivariate data: application to the delineation of homogeneous regions in an agricultural field , 2006 .

[3]  Charu C. Aggarwal,et al.  Data Clustering , 2013 .

[4]  Tomislav Hengl,et al.  Heavy metals in European soils: A geostatistical analysis of the FOREGS geochemical database , 2008 .

[5]  Gérard Govaert,et al.  Clustering of Spatial Data by the EM Algorithm , 1997 .

[6]  Douglas W. Nychka,et al.  Nonstationary modeling for multivariate spatial processes , 2012, J. Multivar. Anal..

[7]  D. Allard,et al.  Clustering geostatistical data , 2000 .

[8]  R. Webster,et al.  A geostatistical basis for spatial weighting in multivariate classification , 1989 .

[9]  D. Allard Geostatistical Classification and Class Kriging , 1998 .

[10]  Y. Pawitan,et al.  Constrained clustering of irregularly sampled spatial data , 2003 .

[11]  J. Chilès,et al.  Geostatistics: Modeling Spatial Uncertainty , 1999 .

[12]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[13]  J. H. Schuenemeyer,et al.  Statistics for Earth and Environmental Scientists , 2011 .

[14]  Jacques Rivoirard,et al.  Unsupervised classification of multivariate geostatistical data: Two algorithms , 2015, Comput. Geosci..

[15]  Denis Marcotte,et al.  The multivariate (co)variogram as a spatial weighting function in classification methods , 1992 .

[16]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[17]  T. Gneiting,et al.  Matérn Cross-Covariance Functions for Multivariate Random Fields , 2010 .

[18]  T. C. Haas,et al.  Lognormal and Moving Window Methods of Estimating Acid Deposition , 1990 .

[19]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .