A Generative Spatial Clustering Model for Random Data through Spanning Trees

When performing analysis of spatial data, there is often the need to aggregate geographical areas into larger regions, a process called regionalization or spatially constrained clustering. These algorithms assume that the items to be clustered are non-stochastic, an assumption not held in many applications. In this work, we present a new probabilistic regionalization algorithm that allows spatially varying random variables as features. Hence, an area highly different from its neighbors can still be considered a member of their cluster if it has a large variance. Our proposal is based on a Bayesian generative spatial product partition model. We build an effective Markov Chain Monte Carlo algorithm to carry out a random walk on the space of all trees and their induced spatial partitions by edges' deletion. We evaluate our algorithm using synthetic data and with one problem of municipalities regionalization based on cancer incidence rates. We are able to better accommodate the natural variation of the data and to diminish the effect of outliers, producing better results than state-of-art approaches.

[1]  J. Weaver,et al.  A Procedure for Nonpartisan Districting: Development of Computer Techniques , 1963 .

[2]  Luc Anselin,et al.  The Max‐P‐Regions Problem , 2012 .

[3]  S Openshaw,et al.  Algorithms for Reengineering 1991 Census Geography , 1995, Environment & planning A.

[4]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[5]  Fernando Bação,et al.  Geo-Self-OrganizingMap (Geo-SOM) for Building and Exploring Homogeneous Regions , 2004, GIScience.

[6]  W. Macmillan Optimization modelling in a GIS framework:The problem of political redistricting. , 1994 .

[7]  Fernando Bação,et al.  Applying genetic algorithms to zone design , 2005, Soft Comput..

[8]  Renato Assunção,et al.  Data-Aware Clustering for Geosensor Networks Data Collection , 2006 .

[9]  Corina da Costa Freitas,et al.  Efficient regionalization techniques for socio‐economic geographical units using minimum spanning trees , 2006, Int. J. Geogr. Inf. Sci..

[10]  Robert Haining,et al.  Regionalisation Tools for the Exploratory Spatial Analysis of Health Data , 1997 .

[11]  J. Hartigan,et al.  A Bayesian Analysis for Change Point Problems , 1993 .

[12]  G. Nemhauser,et al.  Optimal Political Districting by Implicit Enumeration Techniques , 1970 .

[13]  J. Weaver,et al.  Nonpartisan Political Redistricting by Computer , 1965 .

[14]  A. Getis,et al.  Using AMOEBA to Create a Spatial Weights Matrix and Identify Spatial Clusters , 2006 .

[15]  David J. Martin,et al.  Optimizing Census Geography: The Separation of Collection and Output Geographies , 1998, Int. J. Geogr. Inf. Sci..

[16]  Teuvo Kohonen,et al.  The self-organizing map , 1990, Neurocomputing.

[17]  G. Nemhauser,et al.  An Optimization Based Heuristic for Political Districting , 1998 .

[18]  Diansheng Guo,et al.  Regionalization with dynamically constrained agglomerative clustering and partitioning (REDCAP) , 2008, Int. J. Geogr. Inf. Sci..

[19]  Raúl Ramos Lobo,et al.  Supervised regionalization methods: A survey , 2006 .

[20]  Luciano Vieira Dutra,et al.  Classification of Schistosomiasis Prevalence Using Fuzzy Case-Based Reasoning , 2009, IWANN.

[21]  Sandro Sacchelli,et al.  Minimizing carbon footprint of biomass energy supply chain in the Province of Florence , 2011 .