Two-stage permutation tests for determining homogeneity within a spatial cluster

ABSTRACT The discovery of spatial clusters formed by proximal spatial units with similar non-spatial attribute values plays an important role in spatial data analysis. Although several spatial contiguity-constrained clustering methods are currently available, almost all of them discover clusters in a geographical dataset, even though the dataset has no natural clustering structure. Statistically evaluating the significance of the degree of homogeneity within a single spatial cluster is difficult. To overcome this limitation, this study develops a permutation test approach Specifically, the homogeneity of a spatial cluster is measured based on the local variance and cluster member permutation, and two-stage permutation tests are developed to determine the significance of the degree of homogeneity within each spatial cluster. The proposed permutation tests can be integrated into the existing spatial clustering algorithms to detect homogeneous spatial clusters. The proposed tests are compared with four existing tests (i.e., Park’s test, the contiguity-constrained nonparametric analysis of variance (COCOPAN) method, spatial scan statistic, and q-statistic) using two simulated and two meteorological datasets. The comparison shows that the proposed two-stage permutation tests are more effective to identify homogeneous spatial clusters and to determine homogeneous clustering structures in practical applications.

[1]  S Openshaw,et al.  Algorithms for Reengineering 1991 Census Geography , 1995, Environment & planning A.

[2]  Antonio Di Gregorio,et al.  Parametric land cover and land-use classifications as tools for environmental change detection , 2002 .

[3]  Tonglin Zhang,et al.  A measure of spatial stratified heterogeneity , 2016 .

[4]  Daniel A. Griffith A Spatially Adjusted ANOVA Model , 2010 .

[5]  Feng Xu,et al.  Heterogeneous Space–Time Artificial Neural Networks for Space–Time Series Prediction , 2018, Trans. GIS.

[6]  Peter J. Park,et al.  A permutation test for determining significance of clusters with applications to spatial and gene expression data , 2009, Comput. Stat. Data Anal..

[7]  Diansheng Guo,et al.  Regionalization with dynamically constrained agglomerative clustering and partitioning (REDCAP) , 2008, Int. J. Geogr. Inf. Sci..

[8]  Diansheng Guo,et al.  Constructing Geographic Areas for Cancer Data Analysis: A Case Study on Late-stage Breast Cancer Risk in Illinois. , 2012, Applied geography.

[9]  A. D. Gordon A survey of constrained classification , 1996 .

[10]  Fahui Wang,et al.  A Scale-Space Clustering Method: Mitigating the Effect of Scale in the Analysis of Zone-Based Data , 2008 .

[11]  R. Sokal,et al.  Testing for Regional Differences in Means: Distinguishing Inherent from Spurious Spatial Autocorrelation by Restricted Randomization , 2010 .

[12]  Thomas Blaschke,et al.  A comparison of three image-object methods for the multiscale analysis of landscape structure , 2003 .

[13]  B. Singer,et al.  Controlling the False Discovery Rate: A New Application to Account for Multiple and Dependent Tests in Local Statistics of Spatial Association , 2006 .

[14]  Fernando Bação,et al.  Exploratory geospatial data analysis using the GeoSOM suite , 2012, Comput. Environ. Urban Syst..

[15]  Chenghu Zhou,et al.  Detecting arbitrarily shaped clusters using ant colony optimization , 2011, Int. J. Geogr. Inf. Sci..

[16]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[17]  Derya Birant,et al.  ST-DBSCAN: An algorithm for clustering spatial-temporal data , 2007, Data Knowl. Eng..

[18]  Diansheng Guo,et al.  Automatic Region Building for Spatial Analysis , 2011 .

[19]  Dario Bruzzese,et al.  DESPOTA: DEndrogram Slicing through a PemutatiOn Test Approach , 2015, Journal of Classification.

[20]  Kiri Wagstaff,et al.  Constrained Clustering , 2017, Encyclopedia of Machine Learning and Data Mining.

[21]  Pierre R. L. Dutilleul,et al.  Spatio-Temporal Heterogeneity: Concepts and Analyses , 2011 .

[22]  R. Fovell,et al.  Climate zones of the conterminous United States defined using cluster analysis , 1993 .

[23]  A. Burak Göktepe,et al.  Soil clustering by fuzzy c-means algorithm , 2005, Adv. Eng. Softw..

[24]  Dzung L. Pham,et al.  Spatial Models for Fuzzy Clustering , 2001, Comput. Vis. Image Underst..

[25]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[26]  Yan Shi,et al.  A density-based spatial clustering algorithm considering both spatial proximity and attribute similarity , 2012, Comput. Geosci..

[27]  Jan Lepš,et al.  Multivariate Analysis of Ecological Data , 2006 .

[28]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[29]  Clemens Eisank,et al.  Local variance for multi-scale analysis in geomorphometry , 2011, Geomorphology.

[30]  Corina da Costa Freitas,et al.  Efficient regionalization techniques for socio‐economic geographical units using minimum spanning trees , 2006, Int. J. Geogr. Inf. Sci..

[31]  Li Bingyuan,et al.  A New Scheme for Climate Regionalization in China , 2010 .

[32]  Philip Chan,et al.  Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[33]  Olatz Arbelaitz,et al.  An extensive comparative study of cluster validity indices , 2013, Pattern Recognit..

[34]  Michalis Vazirgiannis,et al.  c ○ 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. On Clustering Validation Techniques , 2022 .

[35]  Arno Schäpe,et al.  Multiresolution Segmentation : an optimization approach for high quality multi-scale image segmentation , 2000 .

[36]  Michalis Vazirgiannis,et al.  Clustering validity checking methods: part II , 2002, SGMD.

[37]  Pemetaan Jumlah Balita,et al.  Spatial Scan Statistic , 2014, Encyclopedia of Social Network Analysis and Mining.

[38]  Peng Gao,et al.  Regionalization of forest pattern metrics for the continental United States using contiguity constrained clustering and partitioning , 2012, Ecol. Informatics.

[39]  Luc Anselin,et al.  The Max‐P‐Regions Problem , 2012 .

[40]  Hidetoshi Shimodaira,et al.  Pvclust: an R package for assessing the uncertainty in hierarchical clustering , 2006, Bioinform..

[41]  Bradford A. Hawkins,et al.  Towards a biogeographic regionalization of the European biota , 2010 .

[42]  Robert R. Sokal,et al.  Approximate analysis of variance of spatially autocorrelated regional data , 1990 .

[43]  Min Deng,et al.  Modeling the effect of scale on clustering of spatial points , 2015, Comput. Environ. Urban Syst..

[44]  L. Anselin Local Indicators of Spatial Association—LISA , 2010 .