论文信息 - Thesaurus-Based Feedback to Support Mixed Search and Browsing Environments

Thesaurus-Based Feedback to Support Mixed Search and Browsing Environments

We propose and evaluate a query expansion mechanism that supports searching and browsing in collections of annotated documents. Based on generative language models, our feedback mechanism uses document-level annotations to bias the generation of expansion terms and to generate browsing suggestions in the form of concepts selected from a controlled vocabulary (as typically used in digital library settings). We provide a detailed formalization of our feedback mechanism and evaluate its effectiveness using the TREC 2006 Genomics track test set. As to the retrieval effectiveness, we find a 20% improvement in mean average precision over a query-likelihood baseline, whilst increasing precision at 10. When we base the parameter estimation and feedback generation of our algorithm on a large corpus, we also find an improvement over state-of-theart relevance models. The browsing suggestions are assessed along two dimensions: relevancy and specifity. We present an account of per-topic results, which helps understand for what type of queries our feedback mechanism is particularly helpful.

Maarten de Rijke | Edgar Meij

[1] Chris Buckley,et al. Improving automatic query expansion , 1998, SIGIR '98.

[2] W. Bruce Croft,et al. Relevance-Based Language Models , 2001, SIGIR '01.

[3] Kevyn Collins-Thompson,et al. Query expansion using random walk models , 2005, CIKM '05.

[4] William R. Hersh,et al. A comparative analysis of retrieval features used in the TREC 2006 Genomics Track passage retrieval task , 2007, AMIA.

[5] Elmer V. Bernstam,et al. A day in the life of PubMed: analysis of a typical day's query log. , 2007, Journal of the American Medical Informatics Association : JAMIA.

[6] F ChenStanley,et al. An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[7] Koraljka Golub,et al. Browsing and searching behavior in the renardus web service a study based on log analysis , 2004, JCDL.

[8] Ellen M. Voorhees,et al. Using WordNet to disambiguate word senses for text retrieval , 1993, SIGIR.

[9] Fernando Diaz,et al. Improving the estimation of relevance models using large external corpora , 2006, SIGIR.

[10] William R. Hersh,et al. TREC GENOMICS Track Overview , 2003, TREC.

[11] Tao Tao,et al. Accurate language model estimation with document expansion , 2005, CIKM '05.