A MapReduce algorithm to create contiguity weights for spatial analysis of big data

Spatial analysis of Big data is a key component of Cyber-GIS. However, how to utilize existing cyberinfrastructure (e.g. large computing clusters) to perform parallel and distributed spatial analysis on Big data remains a huge challenge. Problems such as efficient spatial weights creation, spatial statistics and spatial regression of Big data still need investigation. In this research, we propose a MapReduce algorithm for creating contiguity-based spatial weights. This algorithm provides the ability to create spatial weights from very large spatial datasets efficiently by using computing resources that are organized in the Hadoop framework. It works in the paradigm of MapReduce: mappers are distributed in computing clusters to find contiguous neighbors in parallel, then reducers collect the results and generate the weights matrix. To test the performance of this algorithm, we design experiment to create contiguity-based weights matrix from artificial spatial data with up to 190 million polygons using Amazon's Hadoop framework called Elastic MapReduce. The experiment demonstrates the scalability of this parallel algorithm which utilizes large computing clusters to solve the problem of creating contiguity weights on Big data.