论文信息 - A MapReduce algorithm to create contiguity weights for spatial analysis of big data

A MapReduce algorithm to create contiguity weights for spatial analysis of big data

Spatial analysis of Big data is a key component of Cyber-GIS. However, how to utilize existing cyberinfrastructure (e.g. large computing clusters) to perform parallel and distributed spatial analysis on Big data remains a huge challenge. Problems such as efficient spatial weights creation, spatial statistics and spatial regression of Big data still need investigation. In this research, we propose a MapReduce algorithm for creating contiguity-based spatial weights. This algorithm provides the ability to create spatial weights from very large spatial datasets efficiently by using computing resources that are organized in the Hadoop framework. It works in the paradigm of MapReduce: mappers are distributed in computing clusters to find contiguous neighbors in parallel, then reducers collect the results and generate the weights matrix. To test the performance of this algorithm, we design experiment to create contiguity-based weights matrix from artificial spatial data with up to 190 million polygons using Amazon's Hadoop framework called Elastic MapReduce. The experiment demonstrates the scalability of this parallel algorithm which utilizes large computing clusters to solve the problem of creating contiguity weights on Big data.

Xun Li | Sergio J. Rey | Luc Anselin | Julia Koschinsky | Wenwen Li

[1] Youngihn Kho,et al. GeoDa: An Introduction to Spatial Data Analysis , 2006 .

[2] L. Anselin. From SpaceStat to CyberGIS , 2012 .

[3] Minyi Guo,et al. Inverted Grid-Based kNN Query Processing with MapReduce , 2012, 2012 Seventh ChinaGrid Annual Conference.

[4] Michael F. Goodchild,et al. Whose Hand on the Tiller? Revisiting “Spatial Statistical Analysis and GIS” , 2010 .

[5] Shaowen Wang,et al. CyberGIS software: a synthetic review and integration roadmap , 2013, Int. J. Geogr. Inf. Sci..

[6] Hairong Kuang,et al. The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[7] Sergio J. Rey,et al. Spatial econometrics in an age of CyberGIScience , 2012, Int. J. Geogr. Inf. Sci..

[8] Shaowen Wang. A CyberGIS Framework for the Synthesis of Cyberinfrastructure, GIS, and Spatial Analysis , 2010 .

[9] Sergio J. Rey,et al. PySAL: A Python Library of Spatial Analytical Methods , 2010 .