论文信息 - Efficient Algorithms and Cost Models for Reverse Spatial-Keyword k-Nearest Neighbor Search

Efficient Algorithms and Cost Models for Reverse Spatial-Keyword k-Nearest Neighbor Search

Geographic objects associated with descriptive texts are becoming prevalent, justifying the need for spatial-keyword queries that consider both locations and textual descriptions of the objects. Specifically, the relevance of an object to a query is measured by spatial-textual similarity that is based on both spatial proximity and textual similarity. In this article, we introduce the Reverse Spatial-Keyword k-Nearest Neighbor (RSKkNN) query, which finds those objects that have the query as one of their k-nearest spatial-textual objects. The RSKkNN queries have numerous applications in online maps and GIS decision support systems. To answer RSKkNN queries efficiently, we propose a hybrid index tree, called IUR-tree (Intersection-Union R-tree) that effectively combines location proximity with textual similarity. Subsequently, we design a branch-and-bound search algorithm based on the IUR-tree. To accelerate the query processing, we improve IUR-tree by leveraging the distribution of textual description, leading to some variants of the IUR-tree called Clustered IUR-tree (CIUR-tree) and combined clustered IUR-tree (C2IUR-tree), for each of which we develop optimized algorithms. We also provide a theoretical cost model to analyze the efficiency of our algorithms. Our empirical studies show that the proposed algorithms are efficient and scalable.

[1] Christian S. Jensen,et al. Retrieving top-k prestige-based relevant spatial web objects , 2010, Proc. VLDB Endow..

[2] Timos K. Sellis,et al. A model for the prediction of R-tree performance , 1996, PODS.

[3] R. A. Leibler,et al. On Information and Sufficiency , 1951 .

[4] Chen Li,et al. SKIF-P: a point-based indexing and ranking of web documents for spatial-keyword search , 2012, GeoInformatica.

[5] Vipin Kumar,et al. Introduction to Data Mining, (First Edition) , 2005 .

[6] Ken C. K. Lee,et al. IR-Tree: An Efficient Index for Geographic Document Search , 2011, IEEE Trans. Knowl. Data Eng..

[7] Muhammad Aamir Cheema,et al. Influence zone: Efficiently processing reverse k nearest neighbors queries , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[8] Vipin Kumar,et al. Similarity Measures for Categorical Data: A Comparative Evaluation , 2008, SDM.

[9] Bernd-Uwe Pagel,et al. Towards an analysis of range query performance in spatial data structures , 1993, PODS '93.

[10] Walter L. Smith. Probability and Statistics , 1959, Nature.

[11] Dennis Shasha,et al. 2Q: A Low Overhead High Performance Buffer Management Replacement Algorithm , 1994, VLDB.

[12] Elke Achtert,et al. Efficient reverse k-nearest neighbor search in arbitrary metric spaces , 2006, SIGMOD Conference.

[13] Xing Xie,et al. Hybrid index structures for location-based web search , 2005, CIKM '05.

[14] Vipin Kumar,et al. Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.

[15] Yannis Manolopoulos,et al. Performance of Nearest Neighbor Queries in R-Trees , 1997, ICDT.

[16] Hans-Peter Kriegel,et al. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[17] Christian S. Jensen,et al. Indexing the positions of continuously moving objects , 2000, SIGMOD '00.

[18] Divyakant Agrawal,et al. Reverse Nearest Neighbor Queries for Dynamic Databases , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[19] Wei Wu,et al. Continuous Reverse k-Nearest-Neighbor Monitoring , 2008, The Ninth International Conference on Mobile Data Management (mdm 2008).

[20] Gerard Salton,et al. Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[21] Daniel T. Larose,et al. Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[22] Christian S. Jensen,et al. Indexing the Positions of Continuously Moving Objects , 2000, SIGMOD Conference.

[23] Amit Singh,et al. High dimensional reverse nearest neighbor queries , 2003, CIKM '03.

[24] Muhammad Aamir Cheema,et al. Efficiently processing snapshot and continuous reverse k nearest neighbors queries , 2012, The VLDB Journal.

[25] Anthony K. H. Tung,et al. Keyword Search in Spatial Databases: Towards Searching by Document , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[26] Elke Achtert,et al. Reverse k-nearest neighbor search in dynamic and general metric databases , 2009, EDBT '09.

[27] Muhammad Aamir Cheema,et al. Lazy Updates: An Efficient Technique to Continuously Monitoring Reverse kNN , 2009, Proc. VLDB Endow..

[28] Christos Faloutsos,et al. Analysis of object oriented spatial access methods , 1987, SIGMOD '87.

[29] Moni Naor,et al. Optimal aggregation algorithms for middleware , 2001, PODS '01.

[30] Mark Sanderson,et al. Spatio-textual Indexing for Geographical Search on the Web , 2005, SSTD.

[31] Christian Böhm,et al. A cost model for nearest neighbor search in high-dimensional data space , 1997, PODS.

[32] Yufei Tao,et al. An efficient cost model for optimization of nearest neighbor search in low and medium dimensional spaces , 2004, IEEE Transactions on Knowledge and Data Engineering.

[33] S. Muthukrishnan,et al. Influence sets based on reverse nearest neighbor queries , 2000, SIGMOD '00.

[34] Naphtali Rishe,et al. Keyword Search on Spatial Databases , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[35] Yufei Tao,et al. Reverse kNN Search in Arbitrary Dimensionality , 2004, VLDB.

[36] Huaiyu Zhu. On Information and Sufficiency , 1997 .

[37] Hans-Peter Kriegel,et al. Reverse k-Nearest Neighbor monitoring on mobile objects , 2010, GIS '10.

[38] M. Degroot,et al. Probability and Statistics , 2021, Examining an Operational Approach to Teaching Probability.

[39] D. R. Heath-Brown,et al. The Theory of the Riemann Zeta-Function , 1987 .

[40] Anna-Lan Huang,et al. Similarity Measures for Text Document Clustering , 2008 .

[41] Christos Faloutsos,et al. On packing R-trees , 1993, CIKM '93.

[42] R. Mooney,et al. Impact of Similarity Measures on Web-page Clustering , 2000 .

[43] King-Ip Lin,et al. Applying bulk insertion techniques for dynamic reverse nearest neighbor problems , 2003, Seventh International Database Engineering and Applications Symposium, 2003. Proceedings..

[44] Nick Roussopoulos,et al. Nearest neighbor queries , 1995, SIGMOD '95.

[45] Wei Wu,et al. FINCH: evaluating reverse k-Nearest-Neighbor queries on location data , 2008, Proc. VLDB Endow..

[46] Christos Faloutsos,et al. On the 'Dimensionality Curse' and the 'Self-Similarity Blessing' , 2001, IEEE Trans. Knowl. Data Eng..

[47] Jiaheng Lu,et al. Reverse spatial and textual k nearest neighbor search , 2011, SIGMOD '11.

[48] Yannis Manolopoulos,et al. Cost models for distance joins queries using R-trees , 2006, Data Knowl. Eng..

[49] Divyakant Agrawal,et al. Discovery of Influence Sets in Frequently Updated Databases , 2001, VLDB.

[50] Edward A. Fox,et al. Order preserving minimal perfect hash functions and information retrieval , 1989, SIGIR '90.

[51] Timos K. Sellis,et al. Efficient Cost Models for Spatial Queries Using R-Trees , 2000, IEEE Trans. Knowl. Data Eng..

[52] Dan Klein,et al. Evaluating strategies for similarity search on the web , 2002, WWW '02.

[53] Christian S. Jensen,et al. Efficient Retrieval of the Top-k Most Relevant Spatial Web Objects , 2009, Proc. VLDB Endow..

[54] Christos Doulkeridis,et al. Reverse top-k queries , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[55] Christian Böhm,et al. A cost model and index architecture for the similarity join , 2001, Proceedings 17th International Conference on Data Engineering.

[56] Yufei Tao,et al. Spatial queries in dynamic environments , 2003, TODS.

[57] Michael D. Lee,et al. An Empirical Evaluation of Models of Text Document Similarity , 2005 .

[58] Christos Faloutsos,et al. Beyond uniformity and independence: analysis of R-trees using the concept of fractal dimension , 1994, PODS.

[59] Elke A. Rundensteiner,et al. A cost model for estimating the performance of spatial joins using R-trees , 1997, Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150).

[60] Shashi Shekhar,et al. Continuous Evaluation of Monochromatic and Bichromatic Reverse Nearest Neighbors , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[61] Antonin Guttman,et al. R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[62] Gerhard Weikum,et al. ACM Transactions on Database Systems , 2005 .