Text vs. space: efficient geo-search query processing

Many web search services allow users to constrain text queries to a geographic location (e.g., yoga classes near Santa Monica). Important examples include local search engines such as Google Local and location-based search services for smart phones. Several research groups have studied the efficient execution of queries mixing text and geography; their approaches usually combine inverted lists with a spatial access method such as an R-tree or space-filling curve. In this paper, we take a fresh look at this problem. We feel that previous work has often focused on the spatial aspect at the expense of performance considerations in text processing, such as inverted index access, compression, and caching. We describe new and existing approaches and discuss their different perspectives. We then compare their performance in extensive experiments on large document collections. Our results indicate that a query processor that combines state-of-the-art text processing techniques with a simple coarse-grained spatial structure can outperform existing approaches by up to two orders of magnitude. In fact, even a naive approach that first uses a simple inverted index and then filters out any documents outside the query range outperforms many previous methods.

[1]  W. Bruce Croft,et al.  Efficient document retrieval in main memory , 2007, SIGIR.

[2]  Torsten Suel,et al.  Efficient query processing in geographic web search engines , 2006, SIGMOD Conference.

[3]  Ken C. K. Lee,et al.  IR-Tree: An Efficient Index for Geographic Document Search , 2011, IEEE Trans. Knowl. Data Eng..

[4]  Naphtali Rishe,et al.  Keyword Search on Spatial Databases , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[5]  Ray R. Larson,et al.  Geographic information retrieval and spatial browsing , 1996 .

[6]  Oliver Günther,et al.  Multidimensional access methods , 1998, CSUR.

[7]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[8]  Jon Louis Bentley,et al.  Quad trees a data structure for retrieval on composite keys , 1974, Acta Informatica.

[9]  Xing Xie,et al.  Hybrid index structures for location-based web search , 2005, CIKM '05.

[10]  JUSTIN ZOBEL,et al.  Inverted files for text search engines , 2006, CSUR.

[11]  Hanan Samet,et al.  Foundations of multidimensional and metric data structures , 2006, Morgan Kaufmann series in data management systems.

[12]  Christian S. Jensen,et al.  Efficient Retrieval of the Top-k Most Relevant Spatial Web Objects , 2009, Proc. VLDB Endow..

[13]  Christopher B. Jones,et al.  Geographical information retrieval , 2008, Int. J. Geogr. Inf. Sci..

[14]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[15]  Christian Böhm,et al.  XZ-Ordering: A Space-Filling Curve for Objects with Spatial Extension , 1999, SSD.

[16]  Mark de Berg,et al.  Computational geometry: algorithms and applications , 1997 .

[17]  H. Buchner The Grid File : An Adaptable , Symmetric Multikey File Structure , 2001 .

[18]  Christian S. Jensen,et al.  Retrieving top-k prestige-based relevant spatial web objects , 2010, Proc. VLDB Endow..

[19]  Marios Hadjieleftheriou,et al.  R-Trees - A Dynamic Index Structure for Spatial Searching , 2008, ACM SIGSPATIAL International Workshop on Advances in Geographic Information Systems.

[20]  Torsten Suel,et al.  Inverted index compression and query processing with optimized document ordering , 2009, WWW '09.

[21]  Torsten Suel,et al.  Performance of compressed inverted list caching in search engines , 2008, WWW.

[22]  Chen Li,et al.  Processing Spatial-Keyword (SK) Queries in Geographic Information Retrieval (GIR) Systems , 2007, 19th International Conference on Scientific and Statistical Database Management (SSDBM 2007).

[23]  Mark Sanderson,et al.  Spatio-textual Indexing for Geographical Search on the Web , 2005, SSTD.

[24]  Christos Faloutsos,et al.  Signature files: an access method for documents and its analytical performance evaluation , 1984, TOIS.

[25]  Christian S. Jensen,et al.  Hyper-local, directions-based ranking of places , 2011, Proc. VLDB Endow..