Efficient Algorithms and Cost Models for Reverse Spatial-Keyword k-Nearest Neighbor Search

Geographic objects associated with descriptive texts are becoming prevalent, justifying the need for spatial-keyword queries that consider both locations and textual descriptions of the objects. Specifically, the relevance of an object to a query is measured by spatial-textual similarity that is based on both spatial proximity and textual similarity. In this article, we introduce the Reverse Spatial-Keyword k-Nearest Neighbor (RSKkNN) query, which finds those objects that have the query as one of their k-nearest spatial-textual objects. The RSKkNN queries have numerous applications in online maps and GIS decision support systems. To answer RSKkNN queries efficiently, we propose a hybrid index tree, called IUR-tree (Intersection-Union R-tree) that effectively combines location proximity with textual similarity. Subsequently, we design a branch-and-bound search algorithm based on the IUR-tree. To accelerate the query processing, we improve IUR-tree by leveraging the distribution of textual description, leading to some variants of the IUR-tree called Clustered IUR-tree (CIUR-tree) and combined clustered IUR-tree (C2IUR-tree), for each of which we develop optimized algorithms. We also provide a theoretical cost model to analyze the efficiency of our algorithms. Our empirical studies show that the proposed algorithms are efficient and scalable.

[1]  Christian S. Jensen,et al.  Retrieving top-k prestige-based relevant spatial web objects , 2010, Proc. VLDB Endow..

[2]  Timos K. Sellis,et al.  A model for the prediction of R-tree performance , 1996, PODS.

[3]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[4]  Chen Li,et al.  SKIF-P: a point-based indexing and ranking of web documents for spatial-keyword search , 2012, GeoInformatica.

[5]  Vipin Kumar,et al.  Introduction to Data Mining, (First Edition) , 2005 .

[6]  Ken C. K. Lee,et al.  IR-Tree: An Efficient Index for Geographic Document Search , 2011, IEEE Trans. Knowl. Data Eng..

[7]  Muhammad Aamir Cheema,et al.  Influence zone: Efficiently processing reverse k nearest neighbors queries , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[8]  Vipin Kumar,et al.  Similarity Measures for Categorical Data: A Comparative Evaluation , 2008, SDM.

[9]  Bernd-Uwe Pagel,et al.  Towards an analysis of range query performance in spatial data structures , 1993, PODS '93.

[10]  Walter L. Smith Probability and Statistics , 1959, Nature.

[11]  Dennis Shasha,et al.  2Q: A Low Overhead High Performance Buffer Management Replacement Algorithm , 1994, VLDB.

[12]  Elke Achtert,et al.  Efficient reverse k-nearest neighbor search in arbitrary metric spaces , 2006, SIGMOD Conference.

[13]  Xing Xie,et al.  Hybrid index structures for location-based web search , 2005, CIKM '05.

[14]  Vipin Kumar,et al.  Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.

[15]  Yannis Manolopoulos,et al.  Performance of Nearest Neighbor Queries in R-Trees , 1997, ICDT.

[16]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[17]  Christian S. Jensen,et al.  Indexing the positions of continuously moving objects , 2000, SIGMOD '00.

[18]  Divyakant Agrawal,et al.  Reverse Nearest Neighbor Queries for Dynamic Databases , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[19]  Wei Wu,et al.  Continuous Reverse k-Nearest-Neighbor Monitoring , 2008, The Ninth International Conference on Mobile Data Management (mdm 2008).

[20]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[21]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[22]  Christian S. Jensen,et al.  Indexing the Positions of Continuously Moving Objects , 2000, SIGMOD Conference.

[23]  Amit Singh,et al.  High dimensional reverse nearest neighbor queries , 2003, CIKM '03.

[24]  Muhammad Aamir Cheema,et al.  Efficiently processing snapshot and continuous reverse k nearest neighbors queries , 2012, The VLDB Journal.

[25]  Anthony K. H. Tung,et al.  Keyword Search in Spatial Databases: Towards Searching by Document , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[26]  Elke Achtert,et al.  Reverse k-nearest neighbor search in dynamic and general metric databases , 2009, EDBT '09.

[27]  Muhammad Aamir Cheema,et al.  Lazy Updates: An Efficient Technique to Continuously Monitoring Reverse kNN , 2009, Proc. VLDB Endow..

[28]  Christos Faloutsos,et al.  Analysis of object oriented spatial access methods , 1987, SIGMOD '87.

[29]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS '01.

[30]  Mark Sanderson,et al.  Spatio-textual Indexing for Geographical Search on the Web , 2005, SSTD.

[31]  Christian Böhm,et al.  A cost model for nearest neighbor search in high-dimensional data space , 1997, PODS.

[32]  Yufei Tao,et al.  An efficient cost model for optimization of nearest neighbor search in low and medium dimensional spaces , 2004, IEEE Transactions on Knowledge and Data Engineering.

[33]  S. Muthukrishnan,et al.  Influence sets based on reverse nearest neighbor queries , 2000, SIGMOD '00.

[34]  Naphtali Rishe,et al.  Keyword Search on Spatial Databases , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[35]  Yufei Tao,et al.  Reverse kNN Search in Arbitrary Dimensionality , 2004, VLDB.

[36]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[37]  Hans-Peter Kriegel,et al.  Reverse k-Nearest Neighbor monitoring on mobile objects , 2010, GIS '10.

[38]  M. Degroot,et al.  Probability and Statistics , 2021, Examining an Operational Approach to Teaching Probability.

[39]  D. R. Heath-Brown,et al.  The Theory of the Riemann Zeta-Function , 1987 .

[40]  Anna-Lan Huang,et al.  Similarity Measures for Text Document Clustering , 2008 .

[41]  Christos Faloutsos,et al.  On packing R-trees , 1993, CIKM '93.

[42]  R. Mooney,et al.  Impact of Similarity Measures on Web-page Clustering , 2000 .

[43]  King-Ip Lin,et al.  Applying bulk insertion techniques for dynamic reverse nearest neighbor problems , 2003, Seventh International Database Engineering and Applications Symposium, 2003. Proceedings..

[44]  Nick Roussopoulos,et al.  Nearest neighbor queries , 1995, SIGMOD '95.

[45]  Wei Wu,et al.  FINCH: evaluating reverse k-Nearest-Neighbor queries on location data , 2008, Proc. VLDB Endow..

[46]  Christos Faloutsos,et al.  On the 'Dimensionality Curse' and the 'Self-Similarity Blessing' , 2001, IEEE Trans. Knowl. Data Eng..

[47]  Jiaheng Lu,et al.  Reverse spatial and textual k nearest neighbor search , 2011, SIGMOD '11.

[48]  Yannis Manolopoulos,et al.  Cost models for distance joins queries using R-trees , 2006, Data Knowl. Eng..

[49]  Divyakant Agrawal,et al.  Discovery of Influence Sets in Frequently Updated Databases , 2001, VLDB.

[50]  Edward A. Fox,et al.  Order preserving minimal perfect hash functions and information retrieval , 1989, SIGIR '90.

[51]  Timos K. Sellis,et al.  Efficient Cost Models for Spatial Queries Using R-Trees , 2000, IEEE Trans. Knowl. Data Eng..

[52]  Dan Klein,et al.  Evaluating strategies for similarity search on the web , 2002, WWW '02.

[53]  Christian S. Jensen,et al.  Efficient Retrieval of the Top-k Most Relevant Spatial Web Objects , 2009, Proc. VLDB Endow..

[54]  Christos Doulkeridis,et al.  Reverse top-k queries , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[55]  Christian Böhm,et al.  A cost model and index architecture for the similarity join , 2001, Proceedings 17th International Conference on Data Engineering.

[56]  Yufei Tao,et al.  Spatial queries in dynamic environments , 2003, TODS.

[57]  Michael D. Lee,et al.  An Empirical Evaluation of Models of Text Document Similarity , 2005 .

[58]  Christos Faloutsos,et al.  Beyond uniformity and independence: analysis of R-trees using the concept of fractal dimension , 1994, PODS.

[59]  Elke A. Rundensteiner,et al.  A cost model for estimating the performance of spatial joins using R-trees , 1997, Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150).

[60]  Shashi Shekhar,et al.  Continuous Evaluation of Monochromatic and Bichromatic Reverse Nearest Neighbors , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[61]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[62]  Gerhard Weikum,et al.  ACM Transactions on Database Systems , 2005 .