SKYPE: Top-k Spatial-keyword Publish/Subscribe Over Sliding Window

As the prevalence of social media and GPS-enabled devices, a massive amount of geo-textual data has been generated in a stream fashion, leading to a variety of applications such as location-based recommendation and information dissemination. In this paper, we investigate a novel real-time top-k monitoring problem over sliding window of streaming data; that is, we continuously maintain the top-k most relevant geo-textual messages (e.g., geo-tagged tweets) for a large number of spatial-keyword subscriptions (e.g., registered users interested in local events) simultaneously. To provide the most recent information under controllable memory cost, sliding window model is employed on the streaming geo-textual data. To the best of our knowledge, this is the first work to study top-k spatial-keyword publish/subscribe over sliding window. A novel system, called Skype (Top-k Spatial-keyword Publish/Subscribe), is proposed in this paper. In Skype, to continuously maintain top-k results for massive subscriptions, we devise a novel indexing structure upon subscriptions such that each incoming message can be immediately delivered on its arrival. Moreover, to reduce the expensive top-k re-evaluation cost triggered by message expiration, we develop a novel cost-based k-skyband technique to reduce the number of re-evaluations in a cost-effective way. Extensive experiments verify the great efficiency and effectiveness of our proposed techniques.

[1]  Naphtali Rishe,et al.  Keyword Search on Spatial Databases , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[2]  Torsten Suel,et al.  Faster top-k document retrieval using block-max indexes , 2011, SIGIR.

[3]  Christian S. Jensen,et al.  Efficient Retrieval of the Top-k Most Relevant Spatial Web Objects , 2009, Proc. VLDB Endow..

[4]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[5]  Jeffrey Xu Yu,et al.  Duplicate-Insensitive Order Statistics Computation over Data Streams , 2010, IEEE Transactions on Knowledge and Data Engineering.

[6]  Surajit Chaudhuri,et al.  A Primitive Operator for Similarity Joins in Data Cleaning , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[7]  Yang Wang,et al.  Location-aware publish/subscribe , 2013, KDD.

[8]  Jeffrey Xu Yu,et al.  Efficient similarity joins for near-duplicate detection , 2011, TODS.

[9]  Ahmed Eldawy,et al.  SpatialHadoop: A MapReduce framework for spatial data , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[10]  Minyi Guo,et al.  Simba: Efficient In-Memory Spatial Analytics , 2016, SIGMOD Conference.

[11]  Roberto J. Bayardo,et al.  Scaling up all pairs similarity search , 2007, WWW '07.

[12]  Karl Aberer,et al.  Time- and Space-Efficient Sliding Window Top-k Query Processing , 2015, TODS.

[13]  Andrei Z. Broder,et al.  Efficient query evaluation using a two-level retrieval process , 2003, CIKM '03.

[14]  Tao Guo,et al.  Efficient Algorithms for Answering the m-Closest Keywords Query , 2015, SIGMOD Conference.

[15]  Joel H. Saltz,et al.  Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce , 2013, Proc. VLDB Endow..

[16]  Divyakant Agrawal,et al.  MD-HBase: A Scalable Multi-dimensional Data Infrastructure for Location Aware Services , 2011, 2011 IEEE 12th International Conference on Mobile Data Management.

[17]  Marina Fruehauf,et al.  Nonlinear Programming Analysis And Methods , 2016 .

[18]  Sergei Vassilvitskii,et al.  Indexing Boolean Expressions , 2009, Proc. VLDB Endow..

[19]  Torsten Suel,et al.  Text vs. space: efficient geo-search query processing , 2011, CIKM '11.

[20]  Christian S. Jensen,et al.  Spatial Keyword Query Processing: An Experimental Evaluation , 2013, Proc. VLDB Endow..

[21]  Feller William,et al.  An Introduction To Probability Theory And Its Applications , 1950 .

[22]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[23]  Beng Chin Ooi,et al.  Efficiently Processing Continuous k-NN Queries on Data Streams , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[24]  Jiaheng Lu,et al.  Reverse spatial and textual k nearest neighbor search , 2011, SIGMOD '11.

[25]  Gao Cong,et al.  An efficient query indexing mechanism for filtering geo-textual data , 2013, SIGMOD '13.

[26]  Hans-Arno Jacobsen,et al.  BE-tree: an index structure to efficiently match boolean expressions over high-dimensional discrete space , 2011, SIGMOD '11.

[27]  Chris Buckley,et al.  Optimization of inverted vector searches , 1985, SIGIR '85.

[28]  Kian-Lee Tan,et al.  An Efficient Publish/Subscribe Index for ECommerce Databases , 2014, Proc. VLDB Endow..

[29]  Kian-Lee Tan,et al.  Temporal Spatial-Keyword Top-k publish/subscribe , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[30]  Yuguo Chen,et al.  Efficient maintenance of materialized top-k views , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[31]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[32]  Xing Xie,et al.  Hybrid index structures for location-based web search , 2005, CIKM '05.

[33]  Kian-Lee Tan,et al.  Processing spatial keyword query as a top-k aggregation query , 2014, SIGIR.

[34]  Kyriakos Mouratidis,et al.  Continuous monitoring of top-k queries over sliding windows , 2006, SIGMOD Conference.

[35]  João B. Rocha-Junior,et al.  Efficient Processing of Top-k Spatial Keyword Queries , 2011, SSTD.

[36]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[37]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[38]  Marcus Fontoura,et al.  Top-k Publish-Subscribe for Social Annotation of News , 2013, Proc. VLDB Endow..

[39]  Yiqun Liu,et al.  A location-aware publish/subscribe framework for parameterized spatio-textual subscriptions , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[40]  Kian-Lee Tan,et al.  Location-Aware Pub/Sub System: When Continuous Moving Queries Meet Dynamic Event Streams , 2015, SIGMOD Conference.

[41]  Walid G. Aref,et al.  Tornado: A Distributed Spatio-Textual Stream Processing System , 2015, Proc. VLDB Endow..

[42]  Chen Li,et al.  Processing Spatial-Keyword (SK) Queries in Geographic Information Retrieval (GIR) Systems , 2007, 19th International Conference on Scientific and Statistical Database Management (SSDBM 2007).

[43]  Rajiv Ranjan,et al.  Streaming Big Data Processing in Datacenter Clouds , 2014, IEEE Cloud Computing.

[44]  Xuemin Lin,et al.  AP-Tree: Efficiently support continuous spatial-keyword queries over stream , 2015, 2015 IEEE 31st International Conference on Data Engineering.