MRSLICE: Efficient RkNN Query Processing in SpatialHadoop

Nowadays, with the continuously increasing volume of spatial data, it is difficult to execute spatial queries efficiently in spatial data-intensive applications, because of the limited computational capability and storage resources of centralized environments. Due to that, shared-nothing spatial cloud infrastructures have received increasing attention in the last years. SpatialHadoop is a full-edged MapReduce framework with native support for spatial data. SpatialHadoop also supports spatial indexing on top of Hadoop to perform efficiently spatial queries (e.g., k-Nearest Neighbor search, spatial intersection join, etc.). The Reverse k-Nearest Neighbor (RkNN) problem, i.e., finding all objects in a dataset that have a given query point among their corresponding k-nearest neighbors, has been recently studied very thoroughly. RkNN queries are of particular interest in a wide range of applications, such as decision support systems, resource allocation, profile-based marketing, location-based services, etc. In this paper, we present the design and implementation of an RkNN query MapReduce algorithm, so-called MRSLICE, in SpatialHadoop. We have evaluated the performance of the MRSLICE algorithm on SpatialHadoop with big real-world datasets. The experiments have demonstrated the efficiency and scalability of our proposal in comparison with other RkNNQ MapReduce algorithms in SpatialHadoop.

[1]  Flip Korn,et al.  Influence sets based on reverse nearest neighbor queries , 2000, SIGMOD 2000.

[2]  Yufei Tao,et al.  Reverse kNN Search in Arbitrary Dimensionality , 2004, VLDB.

[3]  Zhiyang Li,et al.  Scalable multi‐dimensional RNN query processing , 2015, Concurr. Comput. Pract. Exp..

[4]  Muhammad Aamir Cheema,et al.  Reverse k Nearest Neighbors Query Processing: Experiments and Analysis , 2015, Proc. VLDB Endow..

[5]  Muhammad Aamir Cheema,et al.  SLICE: Reviving regions-based pruning for reverse k nearest neighbors queries , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[6]  Amit Singh,et al.  High dimensional reverse nearest neighbor queries , 2003, CIKM '03.

[7]  Muhammad Aamir Cheema,et al.  Pre-computed Region Guardian Sets Based Reverse kNN Queries , 2016, Data Science and Engineering.

[8]  Wei Wu,et al.  FINCH: evaluating reverse k-Nearest-Neighbor queries on location data , 2008, Proc. VLDB Endow..

[9]  Ahmed Eldawy,et al.  SpatialHadoop: A MapReduce framework for spatial data , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[10]  Michael Vassilakopoulos,et al.  RkNN Query Processing in Distributed Spatial Infrastructures: A Performance Study , 2017, MEDI.

[11]  Yuanyuan Li,et al.  Efficient Multi-dimensional Spatial RkNN Query Processing with MapReduce , 2013, 2013 8th ChinaGrid Annual Conference.

[12]  Muhammad Aamir Cheema,et al.  Pre-computed Region Guardian Sets Based Reverse kNN Queries , 2016, DASFAA.

[13]  Divyakant Agrawal,et al.  Reverse Nearest Neighbor Queries for Dynamic Databases , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[14]  Muhammad Aamir Cheema,et al.  Influence zone: Efficiently processing reverse k nearest neighbors queries , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[15]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[16]  Farnoush Banaei Kashani,et al.  Voronoi-Based Geospatial Query Processing with MapReduce , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.