HIT at TREC 2012 Microblog Track

This paper describes our approaches to the TREC 2012 Microblog Track. We explore the query expansion and document expansion techniques to address the retrieval of short tweet texts. Further, we examine the webpages linked by the URL in a tweet as an external source to improve the performance. Then learning to rank technique is adopted to combine all features for better performance. Finally, we accomplish the microblog filtering via comparing the new tweet against top m relevant tweet retrieved in the history.