Silent Day Detection on Microblog Data

Microblog has become an increasingly popular information source for users to get updates about the world. Given the rapid growth of the microblog data, users are often interested in getting daily (or even hourly) updates about a certain topic. Existing studies on microblog retrieval mainly focused on how to rank results based on their relevance, but little attention has been paid to whether we should return any results to search users. This paper studies the problem of silent day detection. Specifically, given a query and a set of tweets collected over a certain time period (such as a day), we need to determine whether the set contains any relevant tweets of the query. If not, this day is referred to as a silent day. Silent day detection enables us to not overwhelm users with non-relevant tweets. We formulate the problem as a classification problem, and propose two types of new features based on using collective information from query terms. Experiment results over TREC collections show that these new features are more effective in detecting silent days than previously proposed ones.

[1]  W. Bruce Croft,et al.  Predicting query performance , 2002, SIGIR '02.

[2]  W. Bruce Croft,et al.  Query performance prediction in web search environments , 2007, SIGIR.

[3]  Djoerd Hiemstra,et al.  The Combination and Evaluation of Query Performance Prediction Methods , 2009, ECIR.

[4]  Falk Scholer,et al.  Effective Pre-retrieval Query Performance Prediction Using Similarity and Variability Evidence , 2008, ECIR.

[5]  Avi Arampatzis,et al.  An Empirical Study of Query Specificity , 2010, ECIR.

[6]  Iadh Ounis,et al.  Overview of the TREC 2011 Microblog Track , 2011, TREC.

[7]  Ronan Cummins,et al.  Document Score Distribution Models for Query Performance Inference and Prediction , 2014, TOIS.

[8]  Xiang Zhu,et al.  NUDTSNA at TREC 2015 Microblog Track: A Live Retrieval System Framework for Social Network based on Semantic Expansion and Quality Model , 2015, TREC.

[9]  Yuefeng Li,et al.  Microblog Retrieval Using Topical Features and Query Expansion , 2011, TREC.

[10]  Mohand Boughanem,et al.  IRIT at TREC Real-Time Summarization 2018 , 2016, TREC.

[11]  Craig MacDonald,et al.  Overview of the TREC-2012 Microblog Track , 2012, Text Retrieval Conference.

[12]  Charles L. A. Clarke,et al.  An Exploration of Evaluation Metrics for Mobile Push Notifications , 2016, SIGIR.

[13]  Elad Yom-Tov,et al.  Learning to estimate query difficulty: including applications to missing content detection and distributed information retrieval , 2005, SIGIR '05.

[14]  CHENGXIANG ZHAI,et al.  A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.

[15]  Stephen Tomlinson,et al.  Robust, Web and Terabyte Retrieval with Hummingbird SearchServer at TREC 2004 , 2004, TREC.

[16]  Dongyan Zhao,et al.  PKUICST at TREC 2017 Real-Time Summarization Track: Push Notifications and Email Digest , 2017, TREC.

[17]  Gonzalo Navarro,et al.  Word-based self-indexes for natural language text , 2012, TOIS.

[18]  Jimmy J. Lin,et al.  Overview of the TREC 2017 Real-Time Summarization Track , 2017, TREC.

[19]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[20]  Amit Singhal,et al.  Pivoted document length normalization , 1996, SIGIR 1996.

[21]  Oren Kurland,et al.  Predicting Query Performance by Query-Drift Estimation , 2009, TOIS.

[22]  Yiming Yang,et al.  kNN, Rocchio and Metrics for Information Filtering at TREC-10 , 2001, TREC.

[23]  Lourdes Araujo,et al.  Standard Deviation as a Query Hardness Estimator , 2010, SPIRE.

[24]  Joemon M. Jose,et al.  Predicting query performance in microblog retrieval , 2014, SIGIR.

[25]  Jimmy J. Lin,et al.  Overview of the TREC-2014 Microblog Track , 2014, TREC.

[26]  ChengXiang Zhai,et al.  An exploration of axiomatic approaches to information retrieval , 2005, SIGIR '05.

[27]  Mehran Sahami,et al.  Text Mining: Classification, Clustering, and Applications , 2009 .

[28]  Joemon M. Jose,et al.  Improved query performance prediction using standard deviation , 2011, SIGIR.