A Hierarchical Framework for Top-k Location-Aware Error-Tolerant Keyword Search

Location-aware services have become widely available on a variety of devices. The resulting fusion of spatio-textual data enables the kind of top-k query that takes into account both location proximity and text relevance. Considering both the misspellings in user input and the data quality issues of spatiotextual databases, it is necessary to support error-tolerant spatial keyword search for end-users. Existing studies mainly focused on set-based textual relevance, but they cannot find reasonable results when the input tokens are not exactly matched with those from records in the database. In this paper, we propose a novel framework to solve the problem of top-k location-aware similarity search with fuzzy token matching. We propose a hierarchical index HGR-Tree to capture signatures of both spatial and textual relevance. Based on such an index structure, we devise a best-first search algorithm to preferentially access nodes of HGR-Tree with more similar objects while those with dissimilar ones can be pruned. We further devise an incremental search strategy to reduce the overhead brought by supporting fuzzy token matching. Experimental results on real world POI datasets show that our framework outperforms state-of-the-art methods by one to two orders of magnitude.

[1]  Torsten Suel,et al.  Efficient query processing in geographic web search engines , 2006, SIGMOD Conference.

[2]  Zhifeng Bao,et al.  Efficient Selection of Geospatial Data on Maps for Interactive and Visualized Exploration , 2018, SIGMOD Conference.

[3]  Naphtali Rishe,et al.  Keyword Search on Spatial Databases , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[4]  Chunyan Miao,et al.  Towards Best Region Search for Data Exploration , 2016, SIGMOD Conference.

[5]  Anthony K. H. Tung,et al.  Efficient and Effective KNN Sequence Search with Approximate n-grams , 2013, Proc. VLDB Endow..

[6]  Zhenglu Yang,et al.  Fast Algorithms for Top-k Approximate String Matching , 2010, AAAI.

[7]  Christian S. Jensen,et al.  Querying Geo-Textual Data: Spatial Keyword Queries and Beyond , 2016, SIGMOD Conference.

[8]  Gao Cong,et al.  Diversity-Aware Top-k Publish/Subscribe for Text Stream , 2015, SIGMOD Conference.

[9]  Ying Zhang,et al.  An Efficient Framework for Exact Set Similarity Search Using Tree Structure Indexes , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[10]  Jin Wang,et al.  A Transformation-Based Framework for KNN Set Similarity Search , 2020, IEEE Transactions on Knowledge and Data Engineering.

[11]  Jiaheng Lu,et al.  Efficient Merging and Filtering Algorithms for Approximate String Searches , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[12]  Jin Wang,et al.  Two birds with one stone: An efficient hierarchical framework for top-k and threshold-based string similarity search , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[13]  Gang Chen,et al.  Reverse Top-k Geo-Social Keyword Queries in Road Networks , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[14]  Christian S. Jensen,et al.  Spatial Keyword Query Processing: An Experimental Evaluation , 2013, Proc. VLDB Endow..

[15]  Jiajie Xu,et al.  Interactive Top-k Spatial Keyword queries , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[16]  Christian S. Jensen,et al.  Efficient Retrieval of the Top-k Most Relevant Spatial Web Objects , 2009, Proc. VLDB Endow..

[17]  Gonzalo Navarro,et al.  A guided tour to approximate string matching , 2001, CSUR.

[18]  João B. Rocha-Junior,et al.  Efficient Processing of Top-k Spatial Keyword Queries , 2011, SSTD.

[19]  Yiqun Liu,et al.  A location-aware publish/subscribe framework for parameterized spatio-textual subscriptions , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[20]  Wen-Syan Li,et al.  Top-k string similarity search with edit-distance constraints , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[21]  Surajit Chaudhuri,et al.  A Primitive Operator for Similarity Joins in Data Cleaning , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[22]  Yang Wang,et al.  Location-aware publish/subscribe , 2013, KDD.

[23]  Nikos Mamoulis,et al.  Spatio-textual similarity joins , 2012, Proc. VLDB Endow..

[24]  Carlo Zaniolo,et al.  An Efficient Sliding Window Approach for Approximate Entity Extraction with Synonyms , 2019, EDBT.

[25]  Guoliang Li,et al.  Fast-join: An efficient method for fuzzy token matching based string similarity join , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[26]  Ken C. K. Lee,et al.  IR-Tree: An Efficient Index for Geographic Document Search , 2011, IEEE Trans. Knowl. Data Eng..