Tweet Timeline Generation via Graph-Based Dynamic Greedy Clustering

When searching a query in the microblogging, a user would typically receive an archive of tweets as part of a retrospective piece on the impact of social media. For ease of understanding the retrieved tweets, it is useful to produce a summarized timeline about a given topic. However, tweet timeline generation is quite challenging due to the noisy and temporal characteristics of microblogs. In this paper, we propose a graph-based dynamic greedy clustering approach, which considers the coverage, relevance and novelty of the tweet timeline. First, tweet embedding representation is learned in order to construct the tweet semantic graph. Based on the graph, we estimate the coverage of timeline according to the graph connectivity. Furthermore, we integrate a noise tweet elimination component to remove noisy tweets with the lexical and semantic features based on relevance and novelty. Experimental results on public Text Retrieval Conference (TREC) Twitter corpora demonstrate the effectiveness of the proposed approach.

[1]  Roberto Navigli,et al.  Clustering and Diversifying Web Search Results with Graph-Based Word Sense Induction , 2013, CL.

[2]  Craig MacDonald,et al.  On sparsity and drift for effective real-time filtering in microblogs , 2013, CIKM.

[3]  Chao Lv,et al.  PKUICST at TREC 2014 Microblog Track: Feature Extraction for Effective Microblog Search and Adaptive Clustering Algorithms for TTG , 2014, TREC.

[4]  Daniela Rus,et al.  Journal of Graph Algorithms and Applications the Star Clustering Algorithm for Static and Dynamic Information Organization , 2022 .

[5]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[6]  Krithi Ramamritham,et al.  Real Time Discovery of Dense Clusters in Highly Dynamic Graphs: Identifying Real World Events in Highly Dynamic Environments , 2012, Proc. VLDB Endow..

[7]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[8]  Chen Lin,et al.  Generating event storylines from microblogs , 2012, CIKM.

[9]  Dimitrios Gunopulos,et al.  On burstiness-aware search for document sequences , 2009, KDD.

[10]  Jimmy J. Lin,et al.  Overview of the TREC-2013 Microblog Track , 2013, TREC.

[11]  Iadh Ounis,et al.  Overview of the TREC 2011 Microblog Track , 2011, TREC.

[12]  John D. Lafferty,et al.  Beyond independent relevance: methods and evaluation metrics for subtopic retrieval , 2003, SIGIR.

[13]  Sreenivas Gollapudi,et al.  Diversifying search results , 2009, WSDM '09.

[14]  Jimmy J. Lin,et al.  Overview of the TREC-2014 Microblog Track , 2014, TREC.

[15]  Laks V. S. Lakshmanan,et al.  Incremental cluster evolution tracking from highly dynamic network data , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[16]  Tao Li,et al.  Generating Pictorial Storylines Via Minimum-Weight Connected Dominating Set Approximation in Multi-View Graphs , 2012, AAAI.

[17]  Yi Zhang Using bayesian priors to combine classifiers for adaptive filtering , 2004, SIGIR '04.

[18]  Wubai Zhou,et al.  Generating textual storyline to improve situation awareness in disaster management , 2014, Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration (IEEE IRI 2014).

[19]  ChengXiang Zhai,et al.  Learn from web search logs to organize search results , 2007, SIGIR.