Improve retrieval accuracy for difficult queries using negative feedback

How to improve search accuracy for difficult topics is an under-addressed, yet important research question. In this paper, we consider a scenario when the search results are so poor that none of the top-ranked documents is relevant to a user's query, and propose to exploit negative feedback to improve retrieval accuracy for such difficult queries. Specifically, we propose to learn from a certain number of top-ranked non-relevant documents to rerank the rest unseen documents. We propose several approaches to penalizing the documents that are similar to the known non-relevant documents in the language modeling framework. To evaluate the proposed methods, we adapt standard TREC collections to construct a test collection containing only difficult queries. Experiment results show that the proposed approaches are effective for improving retrieval accuracy of difficult queries.