QU at TREC-2014: Online Clustering with Temporal and Topical Expansion for Tweet Timeline Generation

Abstract : In this work, we present our participation in the microblog track in TREC-2014, building upon our first participation last year. We present our approaches for the two tasks of this year: temporally-anchored ad-hoc search and tweet timeline generation. For the ad-hoc search task, we used topical expansion in addition to temporal models to perform retrieval. Our results show that our run based on the typical pseudo relevance feedback query expansion outperformed all of our other runs with a relatively high mean average precision (MAP). As for the timeline generation task, we approached this problem using online incremental clustering of tweets retrieved for a given query. Our approach allows the dynamic creation of semantic clusters while providing a framework for detecting redundant tweets and selecting representative ones to be added to the final timeline. The results demonstrate that using incremental clustering of tweets retrieved through a temporal retrieval model produced the best effectiveness among the submitted runs.

[1]  Meredith Ringel Morris,et al.  #TwitterSearch: a comparison of microblog search and web search , 2011, WSDM '11.

[2]  Rizal Setya Perdana What is Twitter , 2013 .

[3]  Zhenhua Wang,et al.  Sumblr: continuous summarization of evolving tweet streams , 2013, SIGIR.

[4]  Jimmy J. Lin,et al.  Temporal feedback for tweet search with non-parametric density estimation , 2014, SIGIR.

[5]  W. Bruce Croft,et al.  Time-based language models , 2003, CIKM '03.

[6]  Joemon M. Jose,et al.  Temporal Pseudo-relevance Feedback in Microblog Retrieval , 2012, ECIR.

[7]  Kazuhiro Seki,et al.  Combining Recency and Topic-Dependent Temporal Variation for Microblog Search , 2013, ECIR.

[8]  Mostafa Keikha,et al.  Time-based relevance models , 2011, SIGIR.

[9]  Raphaël Troncy,et al.  Live topic generation from event streams , 2013, WWW.

[10]  Jimmy J. Lin,et al.  Overview of the TREC-2013 Microblog Track , 2013, TREC.

[11]  Tamer Elsayed,et al.  QU at TREC-2013: Expansion Experiments for Microblog Ad hoc Search , 2013, TREC.

[12]  Iadh Ounis,et al.  Overview of the TREC 2011 Microblog Track , 2011, TREC.

[13]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[14]  Craig MacDonald,et al.  Overview of the TREC-2012 Microblog Track , 2012, Text Retrieval Conference.

[15]  CHENGXIANG ZHAI,et al.  A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.

[16]  Katrina Fenlon,et al.  Improving retrieval of short texts through document expansion , 2012, SIGIR '12.

[17]  M. de Rijke,et al.  Incorporating Query Expansion and Quality Indicators in Searching Microblog Posts , 2011, ECIR.

[18]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[19]  Jimmy J. Lin,et al.  Temporal Relevance Profiles for Tweet Search , 2013 .

[20]  W. Bruce Croft,et al.  Temporal models for microblogs , 2012, CIKM.

[21]  Geert-Jan Houben,et al.  Groundhog day: near-duplicate detection on Twitter , 2013, WWW.

[22]  Miles Efron,et al.  Query-Specific Recency Ranking : Survival Analysis for Improved Microblog Retrieval , 2012 .

[23]  Hila Becker,et al.  Beyond Trending Topics: Real-World Event Identification on Twitter , 2011, ICWSM.

[24]  Kazuhiro Seki,et al.  Improving pseudo-relevance feedback via tweet selection , 2013, CIKM.