A framework for efficient spatial web object retrieval

The conventional Internet is acquiring a geospatial dimension. Web documents are being geo-tagged and geo-referenced objects such as points of interest are being associated with descriptive text documents. The resulting fusion of geo-location and documents enables new kinds of queries that take into account both location proximity and text relevancy. This paper proposes a new indexing framework for top-k spatial text retrieval. The framework leverages the inverted file for text retrieval and the R-tree for spatial proximity querying. Several indexing approaches are explored within this framework. The framework encompasses algorithms that utilize the proposed indexes for computing location-aware as well as region-aware top-k text retrieval queries, thus taking into account both text relevancy and spatial proximity to prune the search space. Results of empirical studies with an implementation of the framework demonstrate that the paper’s proposal is capable of excellent performance.

[1]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[2]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[3]  Christos Faloutsos,et al.  Signature files: an access method for documents and its analytical performance evaluation , 1984, TOIS.

[4]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[5]  Alistair Moffat,et al.  Coding for compression in full-text retrieval systems , 1992, Data Compression Conference, 1992..

[6]  Christos Faloutsos,et al.  Hybrid Index Organizations for Text Databases , 1992, EDBT.

[7]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[8]  Stephen E. Robertson,et al.  GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[9]  Nick Roussopoulos,et al.  Nearest neighbor queries , 1995, SIGMOD '95.

[10]  Ramesh C. Jain,et al.  Similarity indexing with the SS-tree , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[11]  Ron Sacks-Davis,et al.  Filtered document retrieval with frequency-sorted indexes , 1996 .

[12]  Shin'ichi Satoh,et al.  The SR-tree: an index structure for high-dimensional nearest neighbor queries , 1997, SIGMOD '97.

[13]  W. Bruce Croft,et al.  A language modeling approach to information retrieval , 1998, SIGIR '98.

[14]  Kotagiri Ramamohanarao,et al.  Inverted files versus signature files for text indexing , 1998, TODS.

[15]  Hanan Samet,et al.  Distance browsing in spatial databases , 1999, TODS.

[16]  Scott T. Leutenegger,et al.  Master-client R-trees: a new parallel R-tree architecture , 1999, Proceedings. Eleventh International Conference on Scientific and Statistical Database Management.

[17]  Luis Gravano,et al.  Computing Geographical Scopes of Web Resources , 2000, VLDB.

[18]  Alistair Moffat,et al.  Vector-space ranking with effective early termination , 2001, SIGIR '01.

[19]  Kevin S. McCurley,et al.  Geospatial mapping and navigation of the web , 2001, WWW '01.

[20]  Luis Gravano,et al.  Evaluating top-k queries over Web-accessible databases , 2002, Proceedings 18th International Conference on Data Engineering.

[21]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS.

[22]  Ron Sivan,et al.  Web-a-where: geotagging web content , 2004, SIGIR '04.

[23]  Christopher B. Jones,et al.  Workshop on geographic information retrieval, SIGIR 2004 , 2004, SIGF.

[24]  CHENGXIANG ZHAI,et al.  A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.

[25]  M. Sanderson,et al.  Analyzing geographic queries , 2004 .

[26]  Xing Xie,et al.  Hybrid index structures for location-based web search , 2005, CIKM '05.

[27]  W. Bruce Croft,et al.  Optimization strategies for complex queries , 2005, SIGIR '05.

[28]  Mário J. Silva,et al.  Indexing and ranking in Geo-IR systems , 2005, GIR '05.

[29]  Mark Sanderson,et al.  Spatio-textual Indexing for Geographical Search on the Web , 2005, SSTD.

[30]  Tobias Bjerregaard,et al.  A survey of research and practices of Network-on-chip , 2006, CSUR.

[31]  Haibo Hu,et al.  Range Nearest-Neighbor Query , 2006, IEEE Trans. Knowl. Data Eng..

[32]  Torsten Suel,et al.  Efficient query processing in geographic web search engines , 2006, SIGMOD Conference.

[33]  JUSTIN ZOBEL,et al.  Inverted files for text search engines , 2006, CSUR.

[34]  Chen Li,et al.  Processing Spatial-Keyword (SK) Queries in Geographic Information Retrieval (GIR) Systems , 2007, 19th International Conference on Scientific and Statistical Database Management (SSDBM 2007).

[35]  Naphtali Rishe,et al.  Keyword Search on Spatial Databases , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[36]  Marios Hadjieleftheriou,et al.  R-Trees - A Dynamic Index Structure for Spatial Searching , 2008, ACM SIGSPATIAL International Workshop on Advances in Geographic Information Systems.

[37]  Young-In Song,et al.  Finding question-answer pairs from online forums , 2008, SIGIR '08.

[38]  Anthony K. H. Tung,et al.  Keyword Search in Spatial Databases: Towards Searching by Document , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[39]  Christian S. Jensen,et al.  Efficient Retrieval of the Top-k Most Relevant Spatial Web Objects , 2009, Proc. VLDB Endow..

[40]  Christian S. Jensen,et al.  Retrieving top-k prestige-based relevant spatial web objects , 2010, Proc. VLDB Endow..

[41]  Chen Li,et al.  Hybrid Indexing and Seamless Ranking of Spatial and Textual Features of Web Documents , 2010, DEXA.

[42]  Ken C. K. Lee,et al.  IR-Tree: An Efficient Index for Geographic Document Search , 2011, IEEE Trans. Knowl. Data Eng..