A split-list approach for relevance feedback in information retrieval

In this paper we present a new algorithm for relevance feedback (RF) in information retrieval. Unlike conventional RF algorithms which use the top ranked documents for feedback, our proposed algorithm is a kind of active feedback algorithm which actively chooses documents for the user to judge. The objectives are (a) to increase the number of judged relevant documents and (b) to increase the diversity of judged documents during the RF process. The algorithm uses document-contexts by splitting the retrieval list into sub-lists according to the query term patterns that exist in the top ranked documents. Query term patterns include a single query term, a pair of query terms that occur in a phrase and query terms that occur in proximity. The algorithm is an iterative algorithm which takes one document for feedback in each of the iterations. We experiment with the algorithm using the TREC-6, -7, -8, -2005 and GOV2 data collections and we simulate user feedback using the TREC relevance judgements. From the experimental results, we show that our proposed split-list algorithm is better than the conventional RF algorithm and that our algorithm is more reliable than a similar algorithm using maximal marginal relevance.

[1]  Yi Zhang,et al.  Interactive retrieval based on faceted feedback , 2010, SIGIR '10.

[2]  ChengXiang Zhai,et al.  Term feedback for information retrieval with language models , 2007, SIGIR.

[3]  Ram Akella,et al.  A bayesian logistic regression model for active relevance feedback , 2008, SIGIR '08.

[4]  James Allan,et al.  Automatic Query Expansion Using SMART: TREC 3 , 1994, TREC.

[5]  Kam-Fai Wong,et al.  A retrospective study of a hybrid document-context based retrieval model , 2007, Inf. Process. Manag..

[6]  Donna K. Harman,et al.  Relevance feedback revisited , 1992, SIGIR '92.

[7]  Yun Chi,et al.  Capturing User Interests by Both Exploitation and Exploration , 2007, User Modeling.

[8]  ChengXiang Zhai,et al.  Active feedback in ad hoc information retrieval , 2005, SIGIR '05.

[9]  Yang Lu,et al.  Using Text Classification Method in Relevance Feedback , 2010, ACIIDS.

[10]  Stephen E. Robertson,et al.  Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval , 1994, SIGIR '94.

[11]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[12]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[13]  W. Bruce Croft,et al.  A language modeling approach to information retrieval , 1998, SIGIR '98.

[14]  Yi Zhang,et al.  Incorporating Diversity and Density in Active Learning for Relevance Feedback , 2007, ECIR.

[15]  Ram Akella,et al.  Active relevance feedback for difficult queries , 2008, CIKM '08.

[16]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[17]  Kam-Fai Wong,et al.  Interpreting TF-IDF term weights as making relevance decisions , 2008, TOIS.

[18]  W. Bruce Croft,et al.  Using Probabilistic Models of Document Retrieval without Relevance Information , 1979, J. Documentation.

[19]  Gerard Salton,et al.  The SMART Retrieval System , 1971 .

[20]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .