Top-k Temporal Keyword Query over Social Media Data

Analytic jobs over social media data typically need to explore data of different periods. However, most existing keyword search work merely use creation time of items as the measurement of their recency. In this paper we propose top-k temporal keyword query that ranks data by their aggregate sum of shared times during the given time window. A query algorithm that can be executed over a general temporal inverted index is provided. The complexity analysis based on the power law distribution reveals the upper bound of accessed items. Furthermore, two-tiers structure and piecewise maximum approximation sketch are proposed as refinements. Extensive empirical studies on a reallife dataset show the combination of two refinements achieves remarkable performance improvement under different query settings.

[1]  Torsten Suel,et al.  Faster temporal range queries over versioned text , 2011, SIGIR '11.

[2]  Bernhard Sick,et al.  Online Segmentation of Time Series Based on Polynomial Least-Squares Approximations , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Xiaokui Xiao,et al.  LSII: An indexing structure for exact real-time search on microblogs , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[4]  Jeffrey Scott Vitter,et al.  Optimal External Memory Interval Management , 2003, SIAM J. Comput..

[5]  Daniel Lemire,et al.  A Better Alternative to Piecewise Linear Time Series Segmentation , 2006, SDM.

[6]  Feifei Li,et al.  Ranking Large Temporal Data , 2012, Proc. VLDB Endow..

[7]  Aoying Zhou,et al.  Towards modeling popularity of microblogs , 2013, Frontiers of Computer Science.

[8]  Beng Chin Ooi,et al.  TI: an efficient indexing mechanism for real-time search on tweets , 2011, SIGMOD '11.

[9]  Vassilis J. Tsotras,et al.  A Comparison of Top-k Temporal Keyword Querying over Versioned Text Collections , 2012, DEXA.

[10]  Eamonn J. Keogh,et al.  An online algorithm for segmenting time series , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[11]  Gerhard Weikum,et al.  A Time Machine for Text Search , 2022 .

[12]  Feifei Li,et al.  Top-k queries on temporal data , 2010, The VLDB Journal.