Temporal Pseudo-relevance Feedback in Microblog Retrieval

Twitter has become a major outlet for news, discussion and commentary of on-going events and trends. Effective searching of Twitter collections poses a number of issues for traditional document-based information retrieval (IR) approaches, such as limited document term statistics and spam. In this paper we propose a novel approach to pseudo-relevance feedback, based upon the temporal profiles of n-grams extracted from the top N relevance feedback tweets. A weighted graph is used to model temporal correlation between n-grams, with a PageRank variant employed to combine both pseudo-relevant document term distribution and temporal collection evidence. Preliminary experiments with the TREC Microblogging 2011 Twitter corpus indicate that through parameter optimisation, retrieval effectiveness can be improved.