Spatial association between regionalizations using the information-theoretical V-measure

ABSTRACT There is a keen interest in calculating spatial associations between two variables spanning the same study area. Many methods for calculating such associations have been proposed, but the case when both variables are categorical is underdeveloped despite the fact that many datasets of interest are in the form of either regionalizations or thematic maps. In this paper, we advance this case by adapting the so-called -measure method from its original information-theoretical formulation to the analysis of variance formulation which provides more insight for spatial analysis. We present a step-by-step derivation of the -measure from the perspective of the analysis of variance. The method produces three indices of global association and two sets of local association indicators which could be mapped to indicate spatial distribution of association strength. The open-source software for calculating all indices from vector datasets accompanies the paper. To showcase the utility of the -measure, we identified three different application contexts: comparative, associative, and derivative, and present an example of each of them. The -measure method has several advantages over the widely used Mapcurves method, it has clear interpretations in terms of mutual information as well as in terms of analysis of variance, it provides more precise assessment of association, it is ready-to-use through the accompanying software, and the examples given in the paper serves as a guide to the gamut of its possible applications. Two specific contributions stemming from our re-analysis of the -measure are the finding of the conceptual flaw in the Geographical Detector—a method to quantify associations between numerical and categorical spatial variables, and a proposal for the new, cartographically based algorithm for finding an optimal number of regions in clustering-derived regionalizations.

[1]  Marvin N. Wright,et al.  SoilGrids250m: Global gridded soil information based on machine learning , 2017, PloS one.

[2]  T. Oguchi,et al.  Evaluation of the similarity between spatial tessellations , 2015 .

[3]  Alex Hagen,et al.  Fuzzy set approach to assessing similarity of categorical maps , 2003, Int. J. Geogr. Inf. Sci..

[4]  Philip Chan,et al.  Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[5]  Alex J. Cannon,et al.  Köppen versus the computer: comparing Köppen-Geiger and multivariate regression tree climate classifications in terms of climate homogeneity , 2012 .

[6]  T. McMahon,et al.  Updated world map of the Köppen-Geiger climate classification , 2007 .

[7]  Daniel Wartenberg,et al.  Multivariate Spatial Correlation: A Method for Exploratory Geographical Analysis , 2010 .

[8]  James M. Omernik,et al.  Ecoregions of the Conterminous United States: Evolution of a Hierarchical Spatial Framework , 2014, Environmental Management.

[9]  W. Hargrove,et al.  Potential of Multivariate Quantitative Methods for Delineation and Visualization of Ecoregions , 2004, Environmental management.

[10]  G. Powell,et al.  Terrestrial Ecoregions of the World: A New Map of Life on Earth , 2001 .

[11]  Tomasz F. Stepinski,et al.  Multi-scale segmentation algorithm for pattern-based partitioning of large categorical rasters , 2018, Comput. Geosci..

[12]  Alain Guénoche,et al.  Comparison of Distance Indices Between Partitions , 2006, Data Science and Classification.

[13]  Tomasz F. Stepinski,et al.  Unsupervised regionalization of the United States into landscape pattern types , 2016, Int. J. Geogr. Inf. Sci..

[14]  Giles M. Foody,et al.  Map comparison in GIS , 2007 .

[15]  Julia Hirschberg,et al.  V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure , 2007, EMNLP.

[16]  J. L. Parra,et al.  Very high resolution interpolated climate surfaces for global land areas , 2005 .

[17]  Charlie Frye,et al.  Modeling global Hammond landform regions from 250‐m elevation data , 2017, Trans. GIS.

[18]  Jeffrey A. Cardille,et al.  Uncovering Dominant Land-Cover Patterns of Quebec: Representative Landscapes, Spatial Clusters, and Fences , 2013 .

[19]  Xiaoying Zheng,et al.  Geographical Detectors‐Based Health Risk Assessment and its Application in the Neural Tube Defects Study of the Heshun Region, China , 2010, Int. J. Geogr. Inf. Sci..

[20]  G. Foody Thematic map comparison: Evaluating the statistical significance of differences in classification accuracy , 2004 .

[21]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[22]  Silke Wagner,et al.  Comparing Clusterings - An Overview , 2007 .

[23]  Ranz,et al.  World Map of the Köppen-Geiger climate classification updated — Source link , 2006 .

[24]  A. Getis The Analysis of Spatial Association by Use of Distance Statistics , 2010 .

[25]  T. Stepinski,et al.  Regionalization of multi-categorical landscapes using machine vision methods , 2013 .

[26]  J. Omernik Ecoregions of the Conterminous United States , 1987 .

[27]  W. Köppen Das geographische System der Klimate , 1936 .

[28]  Sang-Il Lee,et al.  Developing a bivariate spatial association measure: An integration of Pearson's r and Moran's I , 2001, J. Geogr. Syst..

[29]  Hans Visser,et al.  The Map Comparison Kit , 2006, Environ. Model. Softw..

[30]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[31]  Y. Dodge on Statistical data analysis based on the L1-norm and related methods , 1987 .

[32]  Rob H. G. Jongman,et al.  A high-resolution bioclimate map of the world: a unifying framework for global biodiversity research and monitoring , 2013 .

[33]  Tomasz F. Stepinski,et al.  Towards machine ecoregionalization of Earth's landmass using pattern segmentation method , 2018, Int. J. Appl. Earth Obs. Geoinformation.

[34]  Xiaodong Yan,et al.  Spatiotemporal change in geographical distribution of global climate types in the context of climate warming , 2014, Climate Dynamics.

[35]  Ralf Metzger,et al.  Ecoregions – The Ecosystem Geography of the Oceans and Continents , 1999 .

[36]  P. Tseng,et al.  Statistical Data Analysis Based on the L1-Norm and Related Methods , 2002 .

[37]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Paul F. Hessburg,et al.  Mapcurves: a quantitative method for comparing categorical maps , 2006, J. Geogr. Syst..

[39]  Roger White,et al.  Hierarchical fuzzy pattern matching for the regional comparison of land use maps , 2001, Int. J. Geogr. Inf. Sci..

[40]  Tomasz F. Stepinski,et al.  On using a clustering approach for global climate classi , 2015 .

[41]  Stefan Harmeling,et al.  Climate Classifications: the Value of Unsupervised Clustering , 2012, ICCS.

[42]  Tomasz F. Stepinski,et al.  Pattern-based, multi-scale segmentation and regionalization of EOSD land cover , 2017, Int. J. Appl. Earth Obs. Geoinformation.