On efficient mutual nearest neighbor query processing in spatial databases

This paper studies a new form of nearest neighbor queries in spatial databases, namely, mutual nearest neighbor (MNN) search. Given a set D of objects and a query object q, an MNN query returns from D, the set of objects that are among the k"1 (>=1) nearest neighbors (NNs) of q; meanwhile, have q as one of their k"2 (>=1) NNs. Although MNN queries are useful in many applications involving decision making, data mining, and pattern recognition, it cannot be efficiently handled by existing spatial query processing approaches. In this paper, we present the first piece of work for tackling MNN queries efficiently. Our methods utilize a conventional data-partitioning index (e.g., R-tree, etc.) on the dataset, employ the state-of-the-art database techniques including best-first based k nearest neighbor (kNN) retrieval and reverse kNN search with TPL pruning, and make use of the advantages of batch processing and reusing technique. An extensive empirical study, based on experiments performed using both real and synthetic datasets, has been conducted to demonstrate the efficiency and effectiveness of our proposed algorithms under various experimental settings.

[1]  Anthony K. H. Tung,et al.  Mining top-n local outliers in large databases , 2001, KDD '01.

[2]  Changshui Zhang,et al.  Clustering in Knowledge Embedded Space , 2003, ECML.

[3]  Hanan Samet,et al.  Distance browsing in spatial databases , 1999, TODS.

[4]  Nick Roussopoulos,et al.  Nearest neighbor queries , 1995, SIGMOD '95.

[5]  Ada Wai-Chee Fu,et al.  Enhanced nearest neighbour search on the R-tree , 1998, SGMD.

[6]  Christian S. Jensen,et al.  Nearest and reverse nearest neighbor queries for moving objects , 2006, The VLDB Journal.

[7]  Yufei Tao,et al.  Reverse kNN Search in Arbitrary Dimensionality , 2004, VLDB.

[8]  M. R. Brito,et al.  Connectivity of the mutual k-nearest-neighbor graph in clustering and outlier detection , 1997 .

[9]  Yannis Manolopoulos,et al.  Cost models for distance joins queries using R-trees , 2006, Data Knowl. Eng..

[10]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[11]  Divyakant Agrawal,et al.  Discovery of Influence Sets in Frequently Updated Databases , 2001, VLDB.

[12]  Yunjun Gao,et al.  Processing Mutual Nearest Neighbor Queries for Moving Object Trajectories , 2008, The Ninth International Conference on Mobile Data Management (mdm 2008).

[13]  G. Krishna,et al.  The condensed nearest neighbor rule using the concept of mutual nearest neighborhood (Corresp.) , 1979, IEEE Trans. Inf. Theory.

[14]  Yannis Manolopoulos,et al.  Closest pair queries in spatial databases , 2000, SIGMOD '00.

[15]  Clara Pizzuti,et al.  An approximate algorithm for top-k closest pairs join query in large high dimensional data , 2005, Data Knowl. Eng..

[16]  Tian Xia,et al.  Continuous Reverse Nearest Neighbor Monitoring , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[17]  Chris H. Q. Ding,et al.  K-nearest-neighbor consistency in data clustering: incorporating local information into global optimization , 2004, SAC '04.

[18]  Yu Qian,et al.  Discovering spatial patterns accurately with effective noise removal , 2004, DMKD '04.

[19]  Anthony K. H. Tung,et al.  Ranking Outliers Using Symmetric Neighborhood Relationship , 2006, PAKDD.

[20]  Divesh Srivastava,et al.  Reverse Nearest Neighbor Aggregates Over Data Streams , 2002, VLDB.

[21]  Amit Singh,et al.  High dimensional reverse nearest neighbor queries , 2003, CIKM '03.

[22]  Ken C. K. Lee,et al.  Ranked Reverse Nearest Neighbor Search , 2008, IEEE Transactions on Knowledge and Data Engineering.

[23]  Nick Roussopoulos,et al.  K-Nearest Neighbor Search for Moving Query Point , 2001, SSTD.

[24]  Christian S. Jensen,et al.  Nearest neighbor and reverse nearest neighbor queries for moving objects , 2002, Proceedings International Database Engineering and Applications Symposium.

[25]  XiaoXiaokui,et al.  Multidimensional reverse kNN search , 2007, VLDB 2007.

[26]  Kyriakos Mouratidis,et al.  Aggregate nearest neighbor queries in spatial databases , 2005, TODS.

[27]  G. Krishna,et al.  Agglomerative clustering using the concept of mutual nearest neighbourhood , 1978, Pattern Recognit..

[28]  Raymond Chi-Wing Wong,et al.  On Efficient Spatial Matching , 2007, VLDB.

[29]  King-Ip Lin,et al.  Applying bulk insertion techniques for dynamic reverse nearest neighbor problems , 2003, Seventh International Database Engineering and Applications Symposium, 2003. Proceedings..

[30]  Zaher Al Aghbari,et al.  Array-index: a plug&search K nearest neighbors method for high-dimensional data , 2005, Data Knowl. Eng..

[31]  S. Muthukrishnan,et al.  Influence sets based on reverse nearest neighbor queries , 2000, SIGMOD '00.

[32]  Lien Fa Lin,et al.  Continuous nearest neighbor search , 2008 .

[33]  Hans-Peter Kriegel,et al.  The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[34]  King-Ip Lin,et al.  An index structure for efficient reverse nearest neighbor queries , 2001, Proceedings 17th International Conference on Data Engineering.

[35]  Yufei Tao,et al.  Continuous Nearest Neighbor Search , 2002, VLDB.

[36]  Yannis Manolopoulos,et al.  Performance of Nearest Neighbor Queries in R-Trees , 1997, ICDT.

[37]  Kyriakos Mouratidis,et al.  Group nearest neighbor queries , 2004, Proceedings. 20th International Conference on Data Engineering.

[38]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[39]  Yufei Tao,et al.  All-nearest-neighbors queries in spatial databases , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[40]  Shashi Shekhar,et al.  Continuous Evaluation of Monochromatic and Bichromatic Reverse Nearest Neighbors , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[41]  Jan Vahrenhold,et al.  Reverse Nearest Neighbor Queries , 2002, Encyclopedia of GIS.

[42]  Divyakant Agrawal,et al.  Reverse Nearest Neighbor Queries for Dynamic Databases , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[43]  Sukho Lee,et al.  Tie-breaking strategies for fast distance join processing , 2002, Data Knowl. Eng..

[44]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[45]  Divyakant Agrawal,et al.  Constrained Nearest Neighbor Queries , 2001, Encyclopedia of GIS.

[46]  Yufei Tao,et al.  Multidimensional reverse kNN search , 2007, The VLDB Journal.

[47]  Haibo Hu,et al.  Range Nearest-Neighbor Query , 2006, IEEE Trans. Knowl. Data Eng..

[48]  Heng Tao Shen,et al.  Surface k-NN Query Processing , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[49]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[50]  David Harel,et al.  Clustering spatial data using random walks , 2001, KDD '01.

[51]  Yannis Manolopoulos,et al.  Algorithms for processing K-closest-pair queries in spatial databases , 2004, Data Knowl. Eng..