Toward Efficient Navigation of Massive-Scale Geo-Textual Streams

With the popularization of portable devices, numerous applications continuously produce huge streams of geo-tagged textual data, thus posing challenges to index geo-textual streaming data efficiently, which is an important task in both data management and AI applications, e.g., real-time data streams mining and targeted advertising. This, however, is not possible with the state-of-the-art indexing methods as they focus on search optimizations of static datasets, and have high index maintenance cost. In this paper, we present NQ-tree, which combines new structure designs and selftuning methods to navigate between update and search efficiency. Our contributions include: (1) the design of multiple stores each with a different emphasis on write-friendness and read-friendness; (2) utilizing data compression techniques to reduce the I/O cost; (3) exploiting both spatial and keyword information to improve the pruning efficiency; (4) proposing an analytical cost model, and using an online self-tuning method to achieve efficient accesses to different workloads. Experiments on two real-world datasets show that NQ-tree outperforms two well designed baselines by up to 10×.

[1]  Alistair Moffat,et al.  Improved word-aligned binary compression for text indexing , 2006, IEEE Transactions on Knowledge and Data Engineering.

[2]  Christian S. Jensen,et al.  Efficient Retrieval of the Top-k Most Relevant Spatial Web Objects , 2009, Proc. VLDB Endow..

[3]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[4]  Alistair Moffat,et al.  Fast Dictionary-Based Compression for Inverted Indexes , 2019, WSDM.

[5]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[6]  Ji-Rong Wen,et al.  Mining frequent neighborhood patterns in a large labeled graph , 2013, CIKM.

[7]  Anthony K. H. Tung,et al.  Scalable top-k spatial keyword search , 2013, EDBT '13.

[8]  Chen Li,et al.  Processing Spatial-Keyword (SK) Queries in Geographic Information Retrieval (GIR) Systems , 2007, 19th International Conference on Scientific and Statistical Database Management (SSDBM 2007).

[9]  Jian Pei,et al.  Within-Network Classification Using Radius-Constrained Neighborhood Patterns , 2014, CIKM.

[10]  M. Mitzenmacher,et al.  Probability and Computing: Events and Probability , 2005 .

[11]  Ji-Rong Wen,et al.  Discovering Neighborhood Pattern Queries by sample answers in knowledge base , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[12]  Shuo Shang,et al.  Region-Based Message Exploration over Spatio-Temporal Data Streams , 2019, AAAI.

[13]  Christian S. Jensen,et al.  Spatial Keyword Query Processing: An Experimental Evaluation , 2013, Proc. VLDB Endow..

[14]  Konstantinos Panagiotou,et al.  The multiple-orientability thresholds for random hypergraphs , 2011, SODA '11.

[15]  Xuemin Lin,et al.  AP-Tree: Efficiently support continuous spatial-keyword queries over stream , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[16]  Bruce K. Bell,et al.  Volume 5 , 1998 .

[17]  Allan Borodin,et al.  Online computation and competitive analysis , 1998 .

[18]  Torsten Suel,et al.  Inverted index compression and query processing with optimized document ordering , 2009, WWW '09.

[19]  Panos Kalnis,et al.  Parallel trajectory similarity joins in spatial networks , 2018, The VLDB Journal.

[20]  Kothuri Venkata Ravi Kanth,et al.  Quadtree and R-tree indexes in oracle spatial: a comparison using GIS data , 2002, SIGMOD '02.

[21]  Panos Kalnis,et al.  Trajectory Similarity Join in Spatial Networks , 2017, Proc. VLDB Endow..

[22]  Xuemin Lin,et al.  Inverted Linear Quadtree: Efficient Top K Spatial Keyword Search , 2016, IEEE Transactions on Knowledge and Data Engineering.

[23]  Torsten Suel,et al.  Text vs. space: efficient geo-search query processing , 2011, CIKM '11.

[24]  Hui Xiong,et al.  Learning geographical preferences for point-of-interest recommendation , 2013, KDD.

[25]  Alistair Moffat,et al.  Inverted Index Compression Using Word-Aligned Binary Codes , 2004, Information Retrieval.

[26]  Yizhou Sun,et al.  LCARS: a location-content-aware recommender system , 2013, KDD.

[27]  Panos Kalnis,et al.  Location-Aware Top-k Term Publish/Subscribe , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[28]  Yan Cui,et al.  SOPS: A System for Efficient Processing of Spatial-Keyword Publish/Subscribe , 2014, Proc. VLDB Endow..

[29]  Christian S. Jensen,et al.  Main-Memory Operation Buffering for Efficient R-Tree Update , 2007, VLDB.

[30]  Ken C. K. Lee,et al.  IR-Tree: An Efficient Index for Geographic Document Search , 2011, IEEE Trans. Knowl. Data Eng..

[31]  Lars Arge,et al.  The Buffer Tree: A Technique for Designing Batched External Data Structures , 2003, Algorithmica.