Classifying and filtering blind feedback terms to improve information retrieval effectiveness

The classification of blind relevance feedback (BRF) terms described in this paper aims at increasing precision or recall by determining which terms decrease, increase or do not change the corresponding information retrieval (IR) performance metric. Classification and IR experiments are performed on the German and English GIRT data, using the BM25 retrieval model. Several basic memory-based classifiers are trained on different feature sets, grouping together features from different query expansion (QE) approaches. Combined classifiers employ the results of the basic classifiers and correctness predictions as features. The best combined classifiers for German (English) yield 22.9% (26.4%) and 5.8% (1.9%) improvement for term classification wrt. precision and recall compared to the best basic classifiers. IR experiments based on this term classification have also been performed. Filtering out different types of BRF terms shows that selecting feedback terms predicted to increase precision improves the average precision significantly compared to experiments without BRF. MAP is improved by +19.8% compared to the best standard BRF experiment (+11% for German). BRF term classification also increases the number of relevant and retrieved documents, geometric MAP, and P@10 in comparison to standard BRF. Experiments based on an optimal classification show that there is potential for improving IR effectiveness even more.

[1]  W. Bruce Croft,et al.  Relevance-Based Language Models , 2001, SIGIR '01.

[2]  Walter Daelemans,et al.  TiMBL: Tilburg Memory-Based Learner , 2007 .

[3]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[4]  Stephen E. Robertson,et al.  On Term Selection for Query Expansion , 1991, J. Documentation.

[5]  Stephen E. Robertson,et al.  Selecting good expansion terms for pseudo-relevance feedback , 2008, SIGIR '08.

[6]  James Allan,et al.  The effect of adding relevance information in a relevance feedback environment , 1994, SIGIR '94.

[7]  Michael Kluck The Domain-Specific Track in CLEF 2004: Overview of the Results and Remarks on the Assessment Process , 2004, CLEF.

[8]  Fredric C. Gey,et al.  Berkeley2 at GeoCLEF: Cross-Language Geographic Information Retrieval of German and English Documents , 2005, CLEF.

[9]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[10]  Justin Zobel,et al.  Questioning Query Expansion: An Examination of Behaviour and Parameters , 2004, ADC.

[11]  Johannes Leveling Exploring term selection for geographic blind feedback , 2007, GIR '07.

[12]  Walter Daelemans,et al.  Improving Accuracy in word class tagging through the Combination of Machine Learning Systems , 2001, CL.

[13]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[14]  Tie-Yan Liu,et al.  Adapting ranking SVM to document retrieval , 2006, SIGIR.

[15]  Andreas Stolcke,et al.  A study in machine learning from imbalanced data for sentence boundary detection in speech , 2006, Comput. Speech Lang..

[16]  Ellen M. Voorhees,et al.  On the number of terms used in automatic query expansion , 2009, Information Retrieval.

[17]  Stephen E. Robertson,et al.  Okapi at TREC-7: Automatic Ad Hoc, Filtering, VLC and Interactive , 1998, TREC.

[18]  Stephen E. Robertson,et al.  GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .