How to improve search accuracy for difficult topics is an under-addressed, yet important research question. In this paper, we consider a scenario when the search results are so poor that none of the top-ranked documents is relevant to a user's query, and propose to exploit negative feedback to improve retrieval accuracy for such difficult queries. Specifically, we propose to learn from a certain number of top-ranked non-relevant documents to rerank the rest unseen documents. We propose several approaches to penalizing the documents that are similar to the known non-relevant documents in the language modeling framework. To evaluate the proposed methods, we adapt standard TREC collections to construct a test collection containing only difficult queries. Experiment results show that the proposed approaches are effective for improving retrieval accuracy of difficult queries.
[1]
Chris Buckley,et al.
Learning routing queries in a query zone
,
1997,
SIGIR '97.
[2]
Bei Yu,et al.
A cross-collection mixture model for comparative text mining
,
2004,
KDD.
[3]
J. J. Rocchio,et al.
Relevance feedback in information retrieval
,
1971
.
[4]
John D. Lafferty,et al.
Model-based feedback in the language modeling approach to information retrieval
,
2001,
CIKM '01.
[5]
John D. Lafferty,et al.
A study of smoothing methods for language models applied to Ad Hoc information retrieval
,
2001,
SIGIR '01.
[6]
Mark D. Dunlop.
The effect of accessing nonmatching documents on relevance feedback
,
1997,
TOIS.