Scalable nearest neighbor query processing based on Inverted Grid Index

With the increasing availability of Location-Based Services (LBS) and mobile internet, the amount of spatial data is growing larger. It poses new requirements and challenges for distributed index and query processing on large scale spatial data. A scalable and distributed spatial data index is important for the effective Nearest Neighbor (NN) query. There are several approaches that implement distributed indices and NN query processing with MapReduce, such as R-tree and Voronoi-based index. However, R-tree is unsuitable for parallelization and Voronoi requires extra computation for localization or local index reconstruction. In this paper, we investigate how to perform NN queries in a distributed environment. Firstly, we present distributed approaches that construct a novel distributed spatial data index: Inverted Grid Index, which is a combination of inverted index and grid partition. Secondly, we illustrate the implementations of two typical applications: distributed k Nearest Neighbor (kNN) and Reverse Nearest Neighbor (RNN) queries which are based on our index structure under cloud computing environment. Finally, we evaluate the effectiveness of our algorithms with extensive experiments using both real and synthetic data sets. Our experiments demonstrate that the time of constructing index structure decreases almost linearly as the number of cluster nodes increases. The results also demonstrate the efficiency and scalability of our NN query algorithms based on Inverted Grid Index.

[1]  Minyi Guo,et al.  Inverted Grid-Based kNN Query Processing with MapReduce , 2012, 2012 Seventh ChinaGrid Annual Conference.

[2]  Feifei Li,et al.  Efficient parallel kNN joins for large data in MapReduce , 2012, EDBT '12.

[3]  Muhammad Aamir Cheema,et al.  CircularTrip: An Effective Algorithm for Continuous k NN Queries , 2007, DASFAA.

[4]  Christian Böhm,et al.  The k-Nearest Neighbour Join: Turbo Charging the KDD Process , 2004, Knowledge and Information Systems.

[5]  Yufei Tao,et al.  Reverse nearest neighbors in large graphs , 2006, IEEE Transactions on Knowledge and Data Engineering.

[6]  Yannis Manolopoulos,et al.  Performance of Nearest Neighbor Queries in R-Trees , 1997, ICDT.

[7]  Yufei Tao,et al.  Reverse kNN Search in Arbitrary Dimensionality , 2004, VLDB.

[8]  Walid G. Aref,et al.  SINA: scalable incremental processing of continuous queries in spatio-temporal databases , 2004, SIGMOD '04.

[9]  Lydia E. Kavraki,et al.  Distributed computation of the knn graph for large high-dimensional point sets , 2007, J. Parallel Distributed Comput..

[10]  Naphtali Rishe,et al.  Experiences on Processing Spatial Data with MapReduce , 2009, SSDBM.

[11]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[12]  Kai Wang,et al.  Spatial Queries Evaluation with MapReduce , 2009, 2009 Eighth International Conference on Grid and Cooperative Computing.

[13]  Xuemin Lin,et al.  SPARK2: Top-k Keyword Query in Relational Databases , 2007, IEEE Transactions on Knowledge and Data Engineering.

[14]  Jianmin Wang,et al.  SPARK2: Top-k Keyword Query in Relational Databases , 2011, IEEE Trans. Knowl. Data Eng..

[15]  Xiaqing Wu,et al.  Automatic alignment of large-scale aerial rasters to road-maps , 2007, GIS.

[16]  Andreas Henrich A Distance Scan Algorithm for Spatial Access Structures , 1994, ACM-GIS.

[17]  Flip Korn,et al.  Influence sets based on reverse nearest neighbor queries , 2000, SIGMOD 2000.

[18]  Li Tu,et al.  Density-based clustering for real-time stream data , 2007, KDD '07.

[19]  Sebastian Michel,et al.  RankReduce - Processing K-Nearest Neighbor Queries on Top of MapReduce , 2010, LSDS-IR@SIGIR.

[20]  Kotagiri Ramamohanarao,et al.  Inverted files versus signature files for text indexing , 1998, TODS.

[21]  Xun Wang,et al.  Behavioral simulations in MapReduce , 2010, Proc. VLDB Endow..

[22]  Muhammad Aamir Cheema,et al.  Efficient Algorithms to Monitor Continuous Constrained k Nearest Neighbor Queries , 2010, DASFAA.

[23]  Keqiu Li,et al.  Sampling-Based Partitioning in MapReduce for Skewed Data , 2012, 2012 Seventh ChinaGrid Annual Conference.

[24]  Farnoush Banaei Kashani,et al.  Voronoi-Based Geospatial Query Processing with MapReduce , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[25]  Keqiu Li,et al.  Object-based image retrieval with kernel on adjacency matrix and local combined features , 2012, TOMCCAP.