Active learning to maximize accuracy vs. effort in interactive information retrieval

We consider an interactive information retrieval task in which the user is interested in finding several to many relevant documents with minimal effort. Given an initial document ranking, user interaction with the system produces relevance feedback (RF) which the system then uses to revise the ranking. This interactive process repeats until the user terminates the search. To maximize accuracy relative to user effort, we propose an active learning strategy. At each iteration, the document whose relevance is maximally uncertain to the system is slotted high into the ranking in order to obtain user feedback for it. Simulated feedback on the Robust04 TREC collection shows our active learning approach dominates several standard RF baselines relative to the amount of feedback provided by the user. Evaluation on Robust04 under noisy feedback and on LETOR collections further demonstrate the effectiveness of active learning, as well as value of negative feedback in this task scenario.

[1]  Charles L. A. Clarke,et al.  Reciprocal rank fusion outperforms condorcet and individual rank learning methods , 2009, SIGIR.

[2]  References , 1971 .

[3]  W. Bruce Croft,et al.  Indri : A language-model based search engine for complex queries ( extended version ) , 2005 .

[4]  Paul Over,et al.  The TREC interactive track: an annotated bibliography , 2001, Inf. Process. Manag..

[5]  ChengXiang Zhai,et al.  A study of methods for negative relevance feedback , 2008, SIGIR '08.

[6]  Fernando Diaz,et al.  Improving the estimation of relevance models using large external corpora , 2006, SIGIR.

[7]  Filip Radlinski,et al.  Active exploration for learning rankings from clickthrough data , 2007, KDD '07.

[8]  Yanjun Qi,et al.  Learning to rank with (a lot of) word features , 2010, Information Retrieval.

[9]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[10]  W. S. Cooper Expected search length: A single measure of retrieval effectiveness based on the weak ordering action of retrieval systems , 1968 .

[11]  Chris Buckley,et al.  Relevance Feedback Track Overview: TREC 2008 , 2008, TREC.

[12]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[13]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[14]  John D. Lafferty,et al.  Model-based feedback in the language modeling approach to information retrieval , 2001, CIKM '01.

[15]  W. Bruce Croft,et al.  LDA-based document models for ad-hoc retrieval , 2006, SIGIR.

[16]  Tao Qin,et al.  LETOR: Benchmark Dataset for Research on Learning to Rank for Information Retrieval , 2007 .

[17]  Tao Qin,et al.  LETOR: A benchmark collection for research on learning to rank for information retrieval , 2010, Information Retrieval.

[18]  Krishna Bharat,et al.  Diversifying web search results , 2010, WWW '10.

[19]  Filip Radlinski,et al.  Improving personalized web search using result diversification , 2006, SIGIR.

[20]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[21]  Robert E. Schapire,et al.  A Brief Introduction to Boosting , 1999, IJCAI.

[22]  Jun Wang,et al.  Portfolio theory of information retrieval , 2009, SIGIR.

[23]  S. Robertson The probability ranking principle in IR , 1997 .

[24]  Thorsten Joachims,et al.  Dynamic ranked retrieval , 2011, WSDM '11.

[25]  Gerard Salton,et al.  Improving retrieval performance by relevance feedback , 1997, J. Am. Soc. Inf. Sci..

[26]  Eric Horvitz,et al.  Selective Supervision: Guiding Supervised Learning with Decision-Theoretic Active Learning , 2007, IJCAI.

[27]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[28]  James Allan,et al.  A comparison of statistical significance tests for information retrieval evaluation , 2007, CIKM '07.

[29]  David Hawking,et al.  Overview of the TREC 2004 Web Track , 2004, TREC.