Blog, Web, Entity, and Relevance Feedback

We describe the participation of the University of Amsterdam's ILPS group in the web, blog, web, entity, and relevance feedback track at TREC 2009. Our main preliminary conclusions are as follows. For the Blog track we find that for top stories identification a blogs to news ap- proach outperforms a simple news to blogs ap- proach. This is interesting, as this approach starts with no input except for a date, whereas the news to blogs approach also has news headlines as in- put. For the web track, we find that spam is an im- portant issue in the ad hoc task and that Wikipedia- based heuristic optimization approaches help to boost the retrieval performance, which is assumed to potentially reduce the spam in top ranked docu- ments. As for the diversity task, we explored dif- ferent methods. Initial results show that cluster- ing and a topic model-based approach have sim- ilar performance, which are relatively better than a query log based approach. Our performance in the Entity track was downright disappointing; the use of co-occurrence models led to poor results; an initial analysis shows that while our approach is able to find correct entity names, we fail to find homepages for these entities. For the relevance feedback track we find that a topical diversity ap- proach provides good feedback documents. Fur- ther, we find that our relevance feedback algorithm seems to help most when there are sufficient rele- vant documents available.

[1]  Joon Ho Lee,et al.  Combining multiple evidence from different properties of weighting schemes , 1995, SIGIR '95.

[2]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[3]  Maarten de Rijke,et al.  External Query Expansion in the Blogosphere , 2008, TREC.

[4]  Carmel Domshlak,et al.  Better than the real thing?: iterative pseudo-query processing using cluster-based language models , 2005, SIGIR '05.

[5]  Maarten de Rijke,et al.  A Generative Blog Post Retrieval Model that Uses Query Expansion based on External Collections , 2009, ACL/IJCNLP.

[6]  Gilad Mishne,et al.  Boosting Web Retrieval through Query Operations , 2005, BNAIC.

[7]  John D. Lafferty,et al.  Model-based feedback in the language modeling approach to information retrieval , 2001, CIKM '01.

[8]  W. Bruce Croft,et al.  A Markov random field model for term dependencies , 2005, SIGIR '05.

[9]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[10]  W. Bruce Croft,et al.  Relevance-Based Language Models , 2001, SIGIR '01.

[11]  Maarten de Rijke,et al.  Finding Key Bloggers, One Post At A Time , 2008, ECAI.

[12]  Edward A. Fox,et al.  Combination of Multiple Searches , 1993, TREC.

[13]  Jaime G. Carbonell,et al.  Document Representation and Query Expansion Models for Blog Recommendation , 2008, ICWSM.

[14]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[15]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[16]  M. de Rijke,et al.  Formal models for expert finding in enterprise corpora , 2006, SIGIR.

[17]  Maarten de Rijke,et al.  Incorporating Non-Relevance Information in the Estimation of Query Models , 2008, TREC.

[18]  Wouter Weerkamp,et al.  Bloggers as experts , 2008 .

[19]  M. de Rijke,et al.  A few examples go a long way: constructing query models from elaborate query formulations , 2008, SIGIR '08.