论文信息 - Flexible pseudo-relevance feedback via selective sampling - 字舞流文

Flexible pseudo-relevance feedback via selective sampling

Although Pseudo-Relevance Feedback (PRF) is a widely used technique for enhancing average retrieval performance, it may actually hurt performance for around one-third of a given set of topics. To enhance the reliability of PRF, Flexible PRF has been proposed, which adjusts the number of pseudo-relevant documents and/or the number of expansion terms for each topic. This paper explores a new, inexpensive Flexible PRF method, called Selective Sampling, which is unique in that it can skip documents in the initial ranked output to look for more “novel” pseudo-relevant documents. While Selective Sampling is only comparable to Traditional PRF in terms of average performance and reliability, per-topic analyses show that Selective Sampling outperforms Traditional PRF almost as often as Traditional PRF outperforms Selective Sampling. Thus, treating the top P documents as relevant is often not the best strategy. However, predicting when Selective Sampling outperforms Traditional PRF appears to be as difficult as predicting when a PRF method fails. For example, our per-topic analyses show that even the proportion of truly relevant documents in the pseudo-relevant set is not necessarily a good performance predictor.

Tetsuya Sakai | Toshihiko Manabe | Makoto Koyama | T. Sakai | M. Koyama | Toshihiko Manabe

[1] Tetsuya Sakai. MT-based Japanese-Enlish cross-language IR experiments using the TREC test collections , 2000, IRAL '00.

[2] Gareth J. F. Jones,et al. Applying summarization techniques for term selection in relevance feedback , 2001, SIGIR '01.

[3] Tetsuya Sakai,et al. BRIDJE over a Language Barrier: Cross-Language Information Access by Integrating Translation and Retrieval , 2003, IRAL.

[4] W. Bruce Croft,et al. Improving the effectiveness of information retrieval with local context analysis , 2000, TOIS.

[5] Ting Liu,et al. A review of relevance feedback experiments at the 2003 reliable information access (RIA) workshop. , 2004, SIGIR '04.

[6] W. Bruce Croft,et al. Predicting query performance , 2002, SIGIR '02.

[7] Stephen E. Robertson,et al. Relative and absolute term selection criteria: a comparative study for English and Japanese IR , 2002, SIGIR '02.

[8] Tetsuya Sakai. Japanese-English Cross-Language Information Retrieval Using Machine Translation and Pseudo-Relevance Feedback , 2001, Int. J. Comput. Process. Orient. Lang..

[9] Tetsuya Sakai,et al. New Performance Metrics Based on Multigrade Relevance: Their Application to Question Answering , 2004, NTCIR.

[10] Stephen E. Robertson,et al. Flexible Pseudo-Relevance Feedback for NTCIR-2 , 2001, NTCIR.

[11] Chris Buckley,et al. Improving automatic query expansion , 1998, SIGIR '98.

[12] C. Buckley,et al. Reliable Information Access Final Workshop Report , 2004 .

[13] Chris Buckley,et al. Topic prediction based on comparative retrieval rankings , 2004, SIGIR '04.

[14] Donna K. Harman,et al. The NRRC reliable information access (RIA) workshop , 2004, SIGIR '04.

[15] Hsin-Hsi Chen,et al. Overview of CLIR Task at the Fourth NTCIR Workshop , 2004, NTCIR.

[16] Ellen M. Voorhees,et al. Overview of the TREC 2004 Robust Retrieval Track , 2004 .

[17] Stephen E. Robertson,et al. Flexible pseudo-relevance feedback using optimization tables , 2001, SIGIR '01.

[18] Justin Zobel,et al. When query expansion fails , 2003, SIGIR '03.

[19] Tetsuya Sakai,et al. Flexible Pseudo-Relevance Feedback via Direct Mapping and Categorization of Search Requests , 2006 .

[20] SakaiTetsuya,et al. Flexible pseudo-relevance feedback via selective sampling , 2005 .

[21] David A. Evans,et al. Design and Evaluation of the CLARIT-TREC-2 System , 1993, TREC.

[22] Karen Spärck Jones,et al. Generic summaries for indexing in information retrieval , 2001, SIGIR '01.

[23] Claire Cardie,et al. Using clustering and SuperConcepts within SMART: TREC 6 , 1997, Inf. Process. Manag..

[24] Tetsuya Sakai. Ranking the NTCIR Systems Based on Multigrade Relevance , 2004, AIRS.

[25] Tetsuya Sakai,et al. A first step towards flexible local feedback for ad hoc retrieval , 2000, IRAL '00.

[26] Jianhua Dong,et al. Ad Hoc Experiments Using EUREKA , 1996, TREC.

[27] Stephen E. Robertson,et al. A probabilistic model of information retrieval: development and comparative experiments - Part 2 , 2000, Inf. Process. Manag..

[28] Tetsuya Sakai,et al. Toshiba KIDS at NTCIR-3: Japanese and English-Japanese IR , 2002, NTCIR.

[29] Tetsuya Sakai,et al. Toshiba BRIDJE at NTCIR-4 CLIR: Monolingual/Bilingual IR and Flexible Feedback , 2004, NTCIR.

[30] Luo Si,et al. Effect of varying number of documents in blind feedback: analysis of the 2003 NRRC RIA workshop "bf_numdocs" experiment suite , 2004, SIGIR '04.

[31] Claudio Carpineto,et al. Fondazione Ugo Bordoni at TREC 2003: Robust and Web Track , 2003, TREC.

[32] Ellen M. Voorhees,et al. The Twelfth Text Retrieval Conference, TREC 2003 , 2004 .

[33] Ellen M. Voorhees. Measuring ineffectiveness , 2004, SIGIR '04.

[34] Saito Yoshimi,et al. High - Precision Search via Question Abstraction for Japanese Question Answering , 2004 .