Efficient Multi-dimensional Spatial RkNN Query Processing with MapReduce

Reverse k Nearest Neighbor (RkNN) queries are of particular interest in a wide range of data mining applications such as decision support systems, profile based marketing and spatial database etc. With the increasing volume of spatial data, it is difficult to perform RkNN queries efficiently because of the limited computational capability and storage resources. In this paper, we investigate how to perform distributed RkNN queries using MapReduce. Firstly, we investigate the Basic-MRRkNN query method based on the inverted grid index over large scale spatial datasets. Secondly, we propose an optimization method: Lazy-MRRkNN query algorithm that prunes the search space when all data points are discovered. To the best of our knowledge, it is the first time that we propose exact RkNN processing algorithms using MapReduce on multi-dimensional datasets. Extensive experiments using both real and synthetic datasets demonstrated that our proposed methods are efficient and scalable.

[1]  Yufei Tao,et al.  Reverse nearest neighbors in large graphs , 2005, 21st International Conference on Data Engineering (ICDE'05).

[2]  Minyi Guo,et al.  Inverted Grid-Based kNN Query Processing with MapReduce , 2012, 2012 Seventh ChinaGrid Annual Conference.

[3]  Amit Singh,et al.  High dimensional reverse nearest neighbor queries , 2003, CIKM '03.

[4]  S. Muthukrishnan,et al.  Influence sets based on reverse nearest neighbor queries , 2000, SIGMOD '00.

[5]  Chuan-Ming Liu,et al.  Reverse Nearest Neighbors Search in Wireless Broadcast Environments , 2012, 2012 9th International Conference on Ubiquitous Intelligence and Computing and 9th International Conference on Autonomic and Trusted Computing.

[6]  King-Ip Lin,et al.  An index structure for efficient reverse nearest neighbor queries , 2001, Proceedings 17th International Conference on Data Engineering.

[7]  Yufei Tao,et al.  Reverse kNN Search in Arbitrary Dimensionality , 2004, VLDB.

[8]  Sebastian Michel,et al.  RankReduce - Processing K-Nearest Neighbor Queries on Top of MapReduce , 2010, LSDS-IR@SIGIR.

[9]  Keqiu Li,et al.  Big Data Processing in Cloud Computing Environments , 2012, 2012 12th International Symposium on Pervasive Systems, Algorithms and Networks.

[10]  Yufei Tao,et al.  Reverse nearest neighbors in large graphs , 2006, IEEE Transactions on Knowledge and Data Engineering.

[11]  Farnoush Banaei Kashani,et al.  Voronoi-Based Geospatial Query Processing with MapReduce , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[12]  Ting Liu,et al.  Clustering Billions of Images with Large Scale Nearest Neighbor Search , 2007, 2007 IEEE Workshop on Applications of Computer Vision (WACV '07).

[13]  Scott Shenker,et al.  Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters , 2012, HotCloud.

[14]  Junzhong Gu,et al.  Privacy Preserving Reverse Nearest-Neighbor Queries Processing on Road Network , 2012, WAIM Workshops.