Two ellipse-based pruning methods for group nearest neighbor queries

Group nearest neighbor (GNN) queries are a relatively new type of operations in spatial database applications. Different from a traditional kNN query which specifies a single query point only, a GNN query has multiple query points. Because of the number of query points and their arbitrary distribution in the data space, a GNN query is much more complex than a kNN query. In this paper, we propose two pruning strategies for GNN queries which take into account the distribution of query points. Our methods employ an ellipse to approximate the extent of multiple query points, and then derive a distance or minimum bounding rectangle (MBR) using that ellipse to prune intermediate nodes in a depth-first search via an R$^*$-tree. These methods are also applicable to the best-first traversal paradigm. We conduct extensive performance studies. The results show that the proposed pruning strategies are more efficient than the existing methods.

[1]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[2]  Yufei Tao,et al.  An efficient cost model for optimization of nearest neighbor search in low and medium dimensional spaces , 2004, IEEE Transactions on Knowledge and Data Engineering.

[3]  Beng Chin Ooi,et al.  Indexing the Distance: An Efficient Method to KNN Processing , 2001, VLDB.

[4]  Clu-istos Foutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[5]  Nick Roussopoulos,et al.  Nearest neighbor queries , 1995, SIGMOD '95.

[6]  Hans-Peter Kriegel,et al.  Knowledge Discovery in Spatial Databases , 1999, DAGM-Symposium.

[7]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[8]  Hans-Peter Kriegel,et al.  Optimal multi-step k-nearest neighbor search , 1998, SIGMOD '98.

[9]  Stephan Olariu,et al.  An Optimal Algorithm for the Angle-Restricted All Nearest Neighbor Problem on the Reconfigurable Mesh, with Applications , 1997, IEEE Trans. Parallel Distributed Syst..

[10]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[11]  A. Guttman,et al.  A Dynamic Index Structure for Spatial Searching , 1984, SIGMOD 1984.

[12]  Divyakant Agrawal,et al.  Constrained Nearest Neighbor Queries , 2001, Encyclopedia of GIS.

[13]  Yannis Manolopoulos,et al.  Performance of Nearest Neighbor Queries in R-Trees , 1997, ICDT.

[14]  Kyriakos Mouratidis,et al.  Group nearest neighbor queries , 2004, Proceedings. 20th International Conference on Data Engineering.

[15]  Philip S. Yu,et al.  Outlier detection for high dimensional data , 2001, SIGMOD '01.

[16]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[17]  Hanan Samet,et al.  Distance browsing in spatial databases , 1999, TODS.