A syntax-aware re-ranker for microblog retrieval

We tackle the problem of improving microblog retrieval algorithms by proposing a robust structural representation of (query, tweet) pairs. We employ these structures in a principled kernel learning framework that automatically extracts and learns highly discriminative features. We test the generalization power of our approach on the TREC Microblog 2011 and 2012 tasks. We find that relational syntactic features generated by structural kernels are effective for learning to rank (L2R) and can easily be combined with those of other existing systems to boost their accuracy. In particular, the results show that our L2R approach improves on almost all the participating systems at TREC, only using their raw scores as a single feature. Our method yields an average increase of 5% in retrieval effectiveness and 7 positions in system ranks.