论文信息 - Language Modeling Approaches to Blog Postand Feed Finding

Language Modeling Approaches to Blog Postand Feed Finding

We describe our participation in the TREC 2007 Blog track. In the opinion task we looked at the differences in performance between Indri and our mixture model, the influence of external expansion and document priors to improve opinion finding; results show that an out-of-the-box Indri implementation out- performs our mixture model, and that external expan- sion on a news corpus is very benificial. Opinion find- ing can be improved using either lexicons or the number of comments as document priors. Our approach to the feed distillation task is based on ag- gregating post-level scores to obtain a feed-level rank- ing. We integrated time-based and persistence as- pects into the retrieval model. After correcting bugs in our post-score aggregation module we found that time-based retrieval improves results only marginally, while persistence-based ranking results in substantial improvements under the right circumstances.

Maarten de Rijke | Wouter Weerkamp | Breyten Ernsting

[1] Craig MacDonald,et al. Overview of the TREC 2007 Blog Track , 2007, TREC.

[2] Craig MacDonald,et al. Overview of the TREC 2006 Blog Track , 2006, TREC.

[3] W. Bruce Croft,et al. Time-based language models , 2003, CIKM '03.

[4] Gilad Mishne,et al. Applied text analytics for blogs , 2007 .

[5] W. Bruce Croft,et al. A Markov random field model for term dependencies , 2005, SIGIR '05.

[6] M. de Rijke,et al. The University of Amsterdam at the TREC 2007 Enterprise Track , 2006 .

[7] Maarten de Rijke,et al. Query and Document Models for Enterprise Search , 2007, TREC.