Inverted Voronoi-Based kNN Query Processing with MapReduce

In mobile cloud computing environments distributed k Nearest Neighbor (kNN) query is an important issue. We consider the problem of processing kNN query over large data sets where the index is jointly maintained by a set of machines in a computing cluster. The kNN query is a primitive operator that is widely used in many fields ranging from knowledge discovery, data mining and spatial databases etc. A scalable and distributed spatial data index plays an important role in conducting kNN query effectively. We can use different ways to conduct distributed indexes and kNN query processing by using MapReduce, i.e. R-tree and Grid-based index, etc. Nevertheless, R-tree is not compatible with parallelization, and Grid is a many-to-many index, which could potentially lead to content redundancy. In the paper, a distributed method of kNN queries applying MapReduce program model will be introduced. In the very beginning, I propose distributed methods which set up a novel distributed spatial data index: Inverted Voronoi Index that combines both inverted index and Voronoi diagram. Next, I propose a kNN queries processing algorithm, it is very efficient because it is based on Voronoi and uses MapReduce. Last but not least, I present the outcomes of extensive experiment that are gained by both real and simulated data sets which indicate efficiency and scalability of the proposed approach.

[1]  Jian Pei,et al.  Probabilistic Reverse Nearest Neighbor Queries on Uncertain Data , 2010, IEEE Transactions on Knowledge and Data Engineering.

[2]  Marios Hadjieleftheriou,et al.  R-Trees - A Dynamic Index Structure for Spatial Searching , 2008, ACM SIGSPATIAL International Workshop on Advances in Geographic Information Systems.

[3]  Beng Chin Ooi,et al.  Efficient Processing of k Nearest Neighbor Joins using MapReduce , 2012, Proc. VLDB Endow..

[4]  Walid G. Aref,et al.  SINA: scalable incremental processing of continuous queries in spatio-temporal databases , 2004, SIGMOD '04.

[5]  Hans-Peter Kriegel,et al.  Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data , 2011, Proc. VLDB Endow..

[6]  Sebastian Michel,et al.  RankReduce - Processing K-Nearest Neighbor Queries on Top of MapReduce , 2010, LSDS-IR@SIGIR.

[7]  Dinesh Manocha,et al.  Fast GPU-based locality sensitive hashing for k-nearest neighbor computation , 2011, GIS.

[8]  Kotagiri Ramamohanarao,et al.  Inverted files versus signature files for text indexing , 1998, TODS.

[9]  Wenming Qiu,et al.  Efficient k-Nearest Neighbors Search in High Dimensions Using MapReduce , 2015, 2015 IEEE Fifth International Conference on Big Data and Cloud Computing.

[10]  Keqiu Li,et al.  Big Data Processing: Big Challenges and Opportunities , 2012, J. Interconnect. Networks.

[11]  Feifei Li,et al.  K nearest neighbor queries and kNN-Joins in large relational databases (almost) for free , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[12]  Weipeng P. Yan,et al.  Parallel kNN Queries for Big Data Based on Voronoi Diagram Using MapReduce , 2018 .

[13]  Minyi Guo,et al.  Inverted Grid-Based kNN Query Processing with MapReduce , 2012, 2012 Seventh ChinaGrid Annual Conference.

[14]  Franz Aurenhammer,et al.  Voronoi diagrams—a survey of a fundamental geometric data structure , 1991, CSUR.

[15]  Naphtali Rishe,et al.  Experiences on Processing Spatial Data with MapReduce , 2009, SSDBM.

[16]  Xiang Lian,et al.  Probabilistic Group Nearest Neighbor Queries in Uncertain Databases , 2008, IEEE Transactions on Knowledge and Data Engineering.

[17]  Farnoush Banaei Kashani,et al.  Voronoi-Based Geospatial Query Processing with MapReduce , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[18]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.