Exploratory geospatial data analysis using the GeoSOM suite

Abstract Clustering constitutes one of the most popular and important tasks in data analysis. This is true for any type of data, and geographic data is no exception. In fact, in geographic knowledge discovery the aim is, more often than not, to explore and let spatial patterns surface rather than develop predictive models. The size and dimensionality of the existing and future databases stress the need for efficient and robust clustering algorithms. This need has been successfully addressed in the context of general-purpose knowledge discovery. Geographic knowledge discovery, nonetheless can still benefit from better tools, especially if these tools are able to integrate geographic information and aspatial variables in order to assist the geographic analyst’s objectives and needs. Typically, the objectives are related with finding spatial patterns based on the interaction between location and aspatial variables. When performing cluster-based analysis of geographic data, user interaction is essential to understand and explore the emerging patterns, and the lack of appropriate tools for this task hinders a lot of otherwise very good work. In this paper, we present the GeoSOM suite as a tool designed to bridge the gap between clustering and the typical geographic information science objectives and needs. The GeoSOM suite implements the GeoSOM algorithm, which changes the traditional Self-Organizing Map algorithm to explicitly take into account geographic information. We present a case study, based on census data from Lisbon, exploring the GeoSOM suite features and exemplifying its use in the context of exploratory data analysis.

[1]  Mark Gahegan,et al.  GeoVISTA studio: a codeless visual programming environment for geoscientific data analysis and visualization , 2002 .

[2]  Alfred Ultsch,et al.  Urban Data Mining Using Emergent SOM , 2007, GfKl.

[3]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[4]  Mark Gahegan,et al.  ICEAGE: Interactive Clustering and Exploration of Large and High-Dimensional Geodata , 2003, GeoInformatica.

[5]  David Martin,et al.  Zone design for environment and health studies using pre-aggregated data. , 2005, Social science & medicine.

[6]  Jiawei Han,et al.  CLARANS: A Method for Clustering Objects for Spatial Data Mining , 2002, IEEE Trans. Knowl. Data Eng..

[7]  Fahui Wang,et al.  A Scale-Space Clustering Method: Mitigating the Effect of Scale in the Analysis of Zone-Based Data , 2008 .

[8]  A. Skupin,et al.  Self-organising maps : applications in geographic information science , 2008 .

[9]  Anthony K. H. Tung,et al.  Spatial clustering methods in data mining : A survey , 2001 .

[10]  Fernando Bação,et al.  The self-organizing map, the Geo-SOM, and relevant variants for geosciences , 2005, Comput. Geosci..

[11]  David O'Sullivan,et al.  Geographic Information Analysis , 2002 .

[12]  Aidong Zhang,et al.  WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases , 1998, VLDB.

[13]  Fernando Bação,et al.  Applications of Different Self‐Organizing Map Variants to Geographical Information Science Problems , 2008 .

[14]  M. V. Velzen,et al.  Self-organizing maps , 2007 .

[15]  Sudha Ram,et al.  Proceedings of the 1997 ACM SIGMOD international conference on Management of data , 1997, ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems.

[16]  Alan M. MacEachren,et al.  Constructing knowledge from multivariate spatiotemporal data: integrating geographical visualization with knowledge discovery in database methods , 1999, Int. J. Geogr. Inf. Sci..

[17]  Paul M. Mather,et al.  An evaluation of Landsat TM spectral data and SAR-derived textural information for lithological discrimination in the Red Sea Hills, Sudan , 1998 .

[18]  Esa Alhoniemi,et al.  Self-organizing map in Matlab: the SOM Toolbox , 1999 .

[19]  L. Anselin Local Indicators of Spatial Association—LISA , 2010 .

[20]  Peter H. A. Sneath,et al.  Numerical Taxonomy: The Principles and Practice of Numerical Classification , 1973 .

[21]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[22]  Alfred Inselberg,et al.  The plane with parallel coordinates , 1985, The Visual Computer.

[23]  Hans-Peter Kriegel,et al.  Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications , 1998, Data Mining and Knowledge Discovery.

[24]  Menno-Jan Kraak,et al.  Geovisualization to support the exploration of large health and demographic survey data , 2004, International journal of health geographics.

[25]  Fernando Bação,et al.  Applying genetic algorithms to zone design , 2005, Soft Comput..

[26]  Thomas Villmann,et al.  Explicit Magnification Control of Self-Organizing Maps for “Forbidden” Data , 2007, IEEE Transactions on Neural Networks.

[27]  Jean-Claude Thill,et al.  Visual Exploration of Spatial Interaction Data with Self‐Organizing Maps , 2008 .

[28]  L. Amelin,et al.  Local Indicators of Spatial Association-LISA , 1995 .

[29]  Pragya Agarwal,et al.  Self-Organising Maps , 2008 .

[30]  Jiawei Han,et al.  GeoMiner: a system prototype for spatial data mining , 1997, SIGMOD '97.

[31]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[32]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[33]  Youngihn Kho,et al.  GeoDa: An Introduction to Spatial Data Analysis , 2006 .

[34]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[35]  Monika Sester,et al.  Optimization approaches for generalization and data abstraction , 2005, Int. J. Geogr. Inf. Sci..

[36]  S. Mastrorillo,et al.  Using self-organizing maps to investigate spatial patterns of non-native species , 2005 .

[37]  Jennifer Widom,et al.  Proceedings of the 24rd International Conference on Very Large Data Bases , 1998, VLDB 1998.

[38]  Mohamad M. Awad,et al.  Multicomponent Image Segmentation Using a Genetic Algorithm and Artificial Neural Network , 2007, IEEE Geoscience and Remote Sensing Letters.

[39]  Fernando Bação,et al.  Geo-Self-OrganizingMap (Geo-SOM) for Building and Exploring Homogeneous Regions , 2004, GIScience.

[40]  Sam Yuan Sung,et al.  Clustering spatial data with a hybrid EM approach , 2005, Pattern Analysis and Applications.

[41]  W. Tobler On the First Law of Geography: A Reply , 2004 .

[42]  Jiawei Han,et al.  Data-Driven Discovery of Quantitative Rules in Relational Databases , 1993, IEEE Trans. Knowl. Data Eng..

[43]  D. Sui Tobler's First Law of Geography: A Big Idea for a Small World? , 2004 .

[44]  Donald Fraser,et al.  M2dSOMAP: clustering and classification of remotely sensed imagery by combining multiple Kohonen self-organizing maps and associative memory , 1993, Proceedings of 1993 International Conference on Neural Networks (IJCNN-93-Nagoya, Japan).

[45]  W. Tobler A Computer Movie Simulating Urban Growth in the Detroit Region , 1970 .

[46]  Stan Openshaw,et al.  A geographical solution to scale and aggregation problems in region-building, partitioning and spatial modelling , 1977 .

[47]  P. Goovaerts,et al.  Delineation of estuarine management units: Evaluation of an automatic procedure , 2005 .

[48]  Stan Openshaw,et al.  Modifiable Areal Unit Problem , 2008, Encyclopedia of GIS.

[49]  Fernando Bação,et al.  Self-organizing Maps as Substitutes for K-Means Clustering , 2005, International Conference on Computational Science.

[50]  Bernard Moulin,et al.  Amalgamation in cartographic generalization using Kohonen's feature nets , 2005, Int. J. Geogr. Inf. Sci..

[51]  Mark Gahegan,et al.  Spatial ordering and encoding for geographic data mining and visualization , 2006, Journal of Intelligent Information Systems.

[52]  T. Kohonen Self-Organized Formation of Correct Feature Maps , 1982 .

[53]  Peter van Oosterom,et al.  Computers, Environment and Urban Systems , 2009 .

[54]  Jiawei Han,et al.  Geographic Data Mining and Knowledge Discovery , 2001 .

[55]  Jean-Claude Thill,et al.  Social area analysis, data mining, and GIS , 2008, Comput. Environ. Urban Syst..

[56]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[57]  William A. Kretzschmar,et al.  Detecting Geographic Associations in English Dialect Features in North America within a Visual Data Mining Environment Integrating Self‐Organizing Maps , 2008 .

[58]  Dale F. Heermann,et al.  Evaluating Soil Color with Farmer Input and Apparent Soil Electrical Conductivity for Management Zone Delineation , 2004 .