RASIM: a rank-aware separate index method for answering top-k spatial keyword queries

A top-k spatial keyword query returns k objects having the highest (or lowest) scores with regard to spatial proximity as well as text relevancy. Approaches for answering top-k spatial keyword queries can be classified into two categories: the separate index approach and the hybrid index approach. The separate index approach maintains the spatial index and the text index independently and can accommodate new data types. However, it is difficult to support top-k pruning and merging efficiently at the same time since it requires two different orders for clustering the objects: the first based on scores for top-k pruning and the second based on object IDs for efficient merging. In this paper, we propose a new separate index method called Rank-Aware Separate Index Method (RASIM) for top-k spatial keyword queries. RASIM supports both top-k pruning and efficient merging at the same time by clustering each separate index in two different orders through the partitioning technique. Specifically, RASIM partitions the set of objects in each index into rank-aware (RA) groups that contain the objects with similar scores and applies the first order to these groups according to their scores and the second order to the objects within each group according to their object IDs. Based on the RA groups, we propose two query processing algorithms: (i) External Threshold Algorithm (External TA) that supports top-k pruning in the unit of RA groups and (ii) Generalized External TA that enhances the performance of External TA by exploiting special properties of the RA groups. RASIM is the first research work that supports top-k pruning based on the separate index approach. Naturally, it keeps the advantages of the separate index approach. In addition, in terms of storage and query processing time, RASIM is more efficient than the IR-tree method, which is the prevailing method to support top-k pruning to date and is based on the hybrid index approach. Experimental results show that, compared with the IR-tree method, the index size of RASIM is reduced by up to 1.85 times, and the query performance is improved by up to 3.22 times.

[1]  Lin Guo,et al.  Efficient inverted lists and query algorithms for structured value ranking in update-intensive relational databases , 2005, 21st International Conference on Data Engineering (ICDE'05).

[2]  JUSTIN ZOBEL,et al.  Inverted files for text search engines , 2006, CSUR.

[3]  Guowei Yang,et al.  Using Local Popularity of Web Resources for Geo-Ranking of Search Engine Results , 2008, World Wide Web.

[4]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS.

[5]  황규영,et al.  Inverted index storage structure using subindexes and large objects for tight coupling of information retrieval with database management systems , 2002 .

[6]  John R. Smith,et al.  The onion technique: indexing for linear optimization queries , 2000, SIGMOD '00.

[7]  Torsten Suel,et al.  Efficient query processing in geographic web search engines , 2006, SIGMOD Conference.

[8]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[9]  Ihab F. Ilyas,et al.  A survey of top-k query processing techniques in relational database systems , 2008, CSUR.

[10]  Eric W. Brown,et al.  Fast evaluation of structured queries for information retrieval , 1995, SIGIR '95.

[11]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[12]  João B. Rocha-Junior,et al.  Efficient Processing of Top-k Spatial Keyword Queries , 2011, SSTD.

[13]  Jae-Gil Lee,et al.  Tightly-coupled spatial database features in the Odysseus/OpenGIS DBMS for high-performance , 2010, GeoInformatica.

[14]  Marios Hadjieleftheriou,et al.  R-Trees - A Dynamic Index Structure for Spatial Searching , 2008, ACM SIGSPATIAL International Workshop on Advances in Geographic Information Systems.

[15]  Gerhard Weikum,et al.  Integrating DB and IR Technologies: What is the Sound of One Hand Clapping? , 2005, CIDR.

[16]  Hanan Samet,et al.  Distance browsing in spatial databases , 1999, TODS.

[17]  Torsten Suel,et al.  Optimized Query Execution in Large Search Engines with Global Page Ordering , 2003, VLDB.

[18]  Christian S. Jensen,et al.  Efficient Retrieval of the Top-k Most Relevant Spatial Web Objects , 2009, Proc. VLDB Endow..

[19]  Gerhard Weikum DB&IR: both sides now , 2007, SIGMOD '07.

[20]  Mark Sanderson,et al.  Spatio-textual Indexing for Geographical Search on the Web , 2005, SSTD.

[21]  Xing Xie,et al.  Hybrid index structures for location-based web search , 2005, CIKM '05.

[22]  Chen Li,et al.  Processing Spatial-Keyword (SK) Queries in Geographic Information Retrieval (GIR) Systems , 2007, 19th International Conference on Scientific and Statistical Database Management (SSDBM 2007).

[23]  Mário J. Silva,et al.  Indexing and ranking in Geo-IR systems , 2005, GIR '05.

[24]  Alistair Moffat,et al.  Impact transformation: effective and efficient web retrieval , 2002, SIGIR '02.

[25]  Tobias Bjerregaard,et al.  A survey of research and practices of Network-on-chip , 2006, CSUR.

[26]  Hyoung-Joo Kim,et al.  An Enhanced Technique for k-Nearest Neighbor Queries with Non-Spatial Selection Predicates , 2004, Multimedia Tools and Applications.

[27]  Hirotoshi Iwasaki,et al.  BEIRA: An Area-based User Interface for Map Services , 2008, World Wide Web.

[28]  Jae-Gil Lee,et al.  Odysseus: a high-performance ORDBMS tightly-coupled with IR features , 2005, 21st International Conference on Data Engineering (ICDE'05).

[29]  Naphtali Rishe,et al.  Keyword Search on Spatial Databases , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[30]  Christos Faloutsos,et al.  The R+-Tree: A Dynamic Index for Multi-Dimensional Objects , 1987, VLDB.

[31]  Young-Koo Lee,et al.  The clustering property of corner transformation for spatial database applications , 2002, Inf. Softw. Technol..

[32]  Ravi Krishnamurthy,et al.  The Multilevel Grid File - A Dynamic Hierarchical Multidimensional File Structure , 1991, DASFAA.

[33]  Ken C. K. Lee,et al.  IR-Tree: An Efficient Index for Geographic Document Search , 2011, IEEE Trans. Knowl. Data Eng..

[34]  Alex Q. Chen,et al.  Web accessibility guidelines , 2011, World Wide Web.

[35]  Young-Koo Lee,et al.  Spatial Join Processing Using Corner Transformation , 1999, IEEE Trans. Knowl. Data Eng..