Voronoi-Based Geospatial Query Processing with MapReduce

Geospatial queries (GQ) have been used in a wide variety of applications such as decision support systems, profile-based marketing, bioinformatics and GIS. Most of the existing query-answering approaches assume centralized processing on a single machine although GQs are intrinsically parallelizable. There are some approaches that have been designed for parallel databases and cluster systems, however, these only apply to the systems with limited parallel processing capability, far from that of the cloud-based platforms. In this paper, we study the problem of parallel geos patial query processing with the MapReduce programming model. Our proposed approach creates a spatial index, Voronoi diagram, for given data points in 2D space and enables efficient processing of a wide range of GQs. We evaluated the performance of our proposed techniques and correspondingly compared them with their closest related work while varying the number of employed nodes.

[1]  Yufei Tao,et al.  Reverse kNN Search in Arbitrary Dimensionality , 2004, VLDB.

[2]  David J. DeWitt,et al.  Building a scaleable geo-spatial DBMS: technology, implementation, and evaluation , 1997, SIGMOD '97.

[3]  Cédric du Mouza,et al.  SD-Rtree: A Scalable Distributed Rtree , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[4]  Naphtali Rishe,et al.  Experiences on Processing Spatial Data with MapReduce , 2009, SSDBM.

[5]  Cyrus Shahabi,et al.  Voronoi-Based K Nearest Neighbor Search for Spatial Network Databases , 2004, VLDB.

[6]  Beng Chin Ooi,et al.  The performance of MapReduce , 2010, Proc. VLDB Endow..

[7]  Philip S. Yu,et al.  Efficient Method for Maximizing Bichromatic Reverse Nearest Neighbor , 2009, Proc. VLDB Endow..

[8]  Witold Litwin,et al.  k-RP*s: a scalable distributed data structure for high-performance multi-attribute access , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[9]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[10]  Anirban Mondal,et al.  P2PR-Tree: An R-Tree-Based Spatial Index for Peer-to-Peer Environments , 2004, EDBT Workshops.

[11]  Jonas S. Karlsson hQT*: A Scalable Distributed Data Structure for High-Performance Spatial Accesses , 1998, FODO.

[12]  Flip Korn,et al.  Influence sets based on reverse nearest neighbor queries , 2000, SIGMOD 2000.

[13]  Joseph M. Hellerstein,et al.  MapReduce Online , 2010, NSDI.

[14]  S. Muthukrishnan,et al.  Influence sets based on reverse nearest neighbor queries , 2000, SIGMOD '00.

[15]  Atsuyuki Okabe,et al.  Spatial Tessellations: Concepts and Applications of Voronoi Diagrams , 1992, Wiley Series in Probability and Mathematical Statistics.

[16]  Wei Wu,et al.  FINCH: evaluating reverse k-Nearest-Neighbor queries on location data , 2008, Proc. VLDB Endow..

[17]  Kai Wang,et al.  Spatial Queries Evaluation with MapReduce , 2009, 2009 Eighth International Conference on Grid and Cooperative Computing.