Nearest group queries

k nearest neighbor (kNN) search is an important problem in a vast number of applications, including clustering, pattern recognition, image retrieval and recommendation systems. It finds k elements from a data source D that are closest to a given query point q in a metric space. In this paper, we extend kNN query to retrieve closest elements from multiple data sources. This new type of query is named k nearest group (kNG) query, which finds k groups of elements that are closest to q with each group containing one object from each data source. kNG query is useful in many location based services. To efficiently process kNG queries, we propose a baseline algorithm using R-tree as well as an improved version using Hilbert R-tree. We also study a variant of kNG query, named kNG Join, which is analagous to kNN Join. Given a set of query points Q, kNG Join returns k nearest groups for each point in Q. Such a query is useful in publish/subscribe systems to find matching items for a collection of subscribers. A comprehensive performance study was conducted on both synthetic and real datasets and the experimental results show that Hilbert R-tree achieves significantly better performance than R-tree in answering both kNG query and kNG Join.

[1]  Christian S. Jensen,et al.  Efficient Retrieval of the Top-k Most Relevant Spatial Web Objects , 2009, Proc. VLDB Endow..

[2]  Hans-Peter Kriegel,et al.  Efficient processing of spatial joins using R-trees , 1993, SIGMOD Conference.

[3]  Christos Faloutsos,et al.  Beyond uniformity and independence: analysis of R-trees using the concept of fractal dimension , 1994, PODS.

[4]  Yufei Tao,et al.  All-nearest-neighbors queries in spatial databases , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[5]  Jun Luo,et al.  On some geometric problems of color-spanning sets , 2012, Journal of Combinatorial Optimization.

[6]  Christos Faloutsos,et al.  Hilbert R-tree: An Improved R-tree using Fractals , 1994, VLDB.

[7]  Tanzima Hashem,et al.  Group Trip Planning Queries in Spatial Databases , 2013, SSTD.

[8]  Christos Faloutsos,et al.  Estimating the Selectivity of Spatial Queries Using the 'Correlation' Fractal Dimension , 1995, VLDB.

[9]  Beng Chin Ooi,et al.  Collective spatial keyword querying , 2011, SIGMOD '11.

[10]  Marios Hadjieleftheriou,et al.  R-Trees - A Dynamic Index Structure for Spatial Searching , 2008, ACM SIGSPATIAL International Workshop on Advances in Geographic Information Systems.

[11]  Anthony K. H. Tung,et al.  Keyword Search in Spatial Databases: Towards Searching by Document , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[12]  Beng Chin Ooi,et al.  Efficient Processing of k Nearest Neighbor Joins using MapReduce , 2012, Proc. VLDB Endow..

[13]  Nick Roussopoulos,et al.  Nearest neighbor queries , 1995, SIGMOD '95.

[14]  J. T. Robinson,et al.  The K-D-B-tree: a search structure for large multidimensional dynamic indexes , 1981, SIGMOD '81.

[15]  Hans-Jörg Schek,et al.  A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.

[16]  Dimitris Papadias,et al.  Multiway spatial joins , 2001, ACM Trans. Database Syst..

[17]  Feifei Li,et al.  Efficient parallel kNN joins for large data in MapReduce , 2012, EDBT '12.

[18]  Christian Böhm,et al.  Independent quantization: an index compression technique for high-dimensional data spaces , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[19]  Christian Böhm,et al.  The k-Nearest Neighbour Join: Turbo Charging the KDD Process , 2004, Knowledge and Information Systems.

[20]  Hanan Samet,et al.  Distance browsing in spatial databases , 1999, TODS.

[21]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[22]  Dimitris Papadias,et al.  Processing and optimization of multiway spatial joins using R-trees , 1999, PODS '99.

[23]  Hanan Samet,et al.  Spatial join techniques , 2007, TODS.

[24]  Theodoros Lappas,et al.  Finding a team of experts in social networks , 2009, KDD.

[25]  Oliver Günther,et al.  Multidimensional access methods , 1998, CSUR.

[26]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[27]  Feifei Li,et al.  K nearest neighbor queries and kNN-Joins in large relational databases (almost) for free , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[28]  Rudolf Fleischer,et al.  Computing minimum diameter color-spanning sets is hard , 2011, Inf. Process. Lett..

[29]  Yannis Manolopoulos,et al.  Performance of Nearest Neighbor Queries in R-Trees , 1997, ICDT.

[30]  Naphtali Rishe,et al.  Keyword Search on Spatial Databases , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[31]  Panos Kalnis,et al.  Quality and efficiency in high dimensional nearest neighbor search , 2009, SIGMOD Conference.

[32]  Anthony K. H. Tung,et al.  Locating mapped resources in Web 2.0 , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[33]  Hans-Peter Kriegel,et al.  The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[34]  Beng Chin Ooi,et al.  iDistance: An adaptive B+-tree based indexing method for nearest neighbor search , 2005, TODS.

[35]  Feifei Li,et al.  On Trip Planning Queries in Spatial Databases , 2005, SSTD.

[36]  Jignesh M. Patel,et al.  Efficient Evaluation of All-Nearest-Neighbor Queries , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[37]  Beng Chin Ooi,et al.  Gorder: An Efficient Method for KNN Join Processing , 2004, VLDB.

[38]  Pavel Zezula,et al.  M-tree: An Efficient Access Method for Similarity Search in Metric Spaces , 1997, VLDB.

[39]  Rudolf Fleischer,et al.  Computing Minimum Diameter Color-Spanning Sets , 2010, FAW.

[40]  Elke A. Rundensteiner,et al.  Spatial Joins Using R-trees: Breadth-First Traversal with Global Optimizations , 1997, VLDB.

[41]  Christos Faloutsos,et al.  The TV-tree: An index structure for high-dimensional data , 1994, The VLDB Journal.

[42]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.