A few examples go a long way: constructing query models from elaborate query formulations

We address a specific enterprise document search scenario, where the information need is expressed in an elaborate manner. In our scenario, information needs are expressed using a short query (of a few keywords) together with examples of key reference pages. Given this setup, we investigate how the examples can be utilized to improve the end-to-end performance on the document retrieval task. Our approach is based on a language modeling framework, where the query model is modified to resemble the example pages. We compare several methods for sampling expansion terms from the example pages to support query-dependent and query-independent query expansion; the latter is motivated by the wish to increase "aspect recall", and attempts to uncover aspects of the information need not captured by the query. For evaluation purposes we use the CSIRO data set created for the TREC 2007 Enterprise track. The best performance is achieved by query models based on query-independent sampling of expansion terms from the example documents.

[1]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[2]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[3]  D. Pendick,et al.  Better than the Real Thing , 1992 .

[4]  Richard M. Schwartz,et al.  A hidden Markov model information retrieval system , 1999, SIGIR '99.

[5]  W. Bruce Croft,et al.  A general language model for information retrieval , 1999, CIKM '99.

[6]  Djoerd Hiemstra,et al.  Using language models for information retrieval , 2001 .

[7]  John D. Lafferty,et al.  Model-based feedback in the language modeling approach to information retrieval , 2001, CIKM '01.

[8]  Djoerd Hiemstra,et al.  Language Modelling and Relevance , 2003 .

[9]  ChengXiang Zhai,et al.  Probabilistic Relevance Models Based on Document and Query Generation , 2003 .

[10]  Chris Buckley Why current IR engines fail , 2004, SIGIR '04.

[11]  Djoerd Hiemstra,et al.  Parsimonious language models for information retrieval , 2004, SIGIR '04.

[12]  Donna K. Harman,et al.  The NRRC reliable information access (RIA) workshop , 2004, SIGIR '04.

[13]  Carmel Domshlak,et al.  Better than the real thing?: iterative pseudo-query processing using cluster-based language models , 2005, SIGIR '05.

[14]  Yiqun Liu,et al.  THUIR at TREC 2005: Enterprise Track , 2005, TREC.

[15]  Iadh Ounis,et al.  University of Glasgow at TREC 2006: Experiments in Terabyte and Enterprise Tracks with Terrier , 2006, TREC.

[16]  M. de Rijke,et al.  The University of Amsterdam at the TREC 2007 Enterprise Track , 2006 .

[17]  Tao Tao,et al.  Regularized estimation of mixture models for robust pseudo-relevance feedback , 2006, SIGIR.

[18]  Peter Bailey,et al.  The CSIRO enterprise search test collection , 2007, SIGF.

[19]  Peter Bailey,et al.  TREC 2007 Enterprise Track at CSIRO , 2007, TREC.

[20]  Rong Yan,et al.  Query expansion using probabilistic local feedback with application to multimedia retrieval , 2007, CIKM '07.

[21]  Craig MacDonald,et al.  University of Glasgow at TREC 2007: Experiments in Blog and Enterprise Tracks with Terrier , 2007, TREC.

[22]  Haiqiang Chen,et al.  Research on Enterprise Track of TREC 2008 , 2007, TREC.

[23]  Srini Ramaswamy,et al.  UALR at TREC-ENT 2007 , 2007, TREC.

[24]  Yiqun Liu,et al.  THUIR at TREC 2007: Enterprise Track , 2007, TREC.

[25]  Peter Bailey,et al.  Overview of the TREC 2007 Enterprise Track , 2007, TREC.

[26]  Peter Bailey,et al.  Overview of the TREC 2007 Enterprise Track | NIST , 2008 .