Latent concept expansion using markov random fields

Query expansion, in the form of pseudo-relevance feedback or relevance feedback, is a common technique used to improve retrieval effectiveness. Most previous approaches have ignored important issues, such as the role of features and the importance of modeling term dependencies. In this paper, we propose a robust query expansion technique based onthe Markov random field model for information retrieval. The technique, called latent concept expansion, provides a mechanism for modeling term dependencies during expansion. Furthermore, the use of arbitrary features within the model provides a powerful framework for going beyond simple term occurrence features that are implicitly used by most other expansion techniques. We evaluate our technique against relevance models, a state-of-the-art language modeling query expansion technique. Our model demonstrates consistent and significant improvements in retrieval effectiveness across several TREC data sets. We also describe how our technique can be used to generate meaningful multi-term concepts for tasks such as query suggestion/reformulation.

[1]  Van Rijsbergen,et al.  A theoretical basis for the use of co-occurence data in information retrieval , 1977 .

[2]  W. Bruce Croft,et al.  LDA-based document models for ad-hoc retrieval , 2006, SIGIR.

[3]  W. Bruce Croft,et al.  A general language model for information retrieval , 1999, CIKM '99.

[4]  Oren Kurland,et al.  Corpus structure, language models, and ad hoc information retrieval , 2004, SIGIR '04.

[5]  Koji Eguchi,et al.  NTCIR-5 Query Expansion Experiments using Term Dependence Models , 2005, NTCIR.

[6]  J. Allen,et al.  Why Bigger Windows Are Better Than Smaller Ones TITLE2 , 1997 .

[7]  James P. Callan,et al.  Experiments Using the Lemur Toolkit , 2001, TREC.

[8]  W. Bruce Croft,et al.  A Markov random field model for term dependencies , 2005, SIGIR '05.

[9]  W. Bruce Croft,et al.  Indri: A language-model based search engine for complex queries1 , 2005 .

[10]  W. Bruce Croft Boolean queries and term dependencies in probabilistic retrieval models , 1986, J. Am. Soc. Inf. Sci..

[11]  John C. Henderson,et al.  Direct Maximization of Average Precision by Hill-Climbing, with a Comparison to a Maximum Entropy Approach , 2004, HLT-NAACL.

[12]  Jianfeng Gao,et al.  Dependence language model for information retrieval , 2004, SIGIR '04.

[13]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[14]  Fernando Diaz,et al.  UMass at TREC 2004: Novelty and HARD , 2004, TREC.

[15]  C. J. van Rijsbergen,et al.  An Evaluation of feedback in Document Retrieval using Co‐Occurrence Data , 1978, J. Documentation.

[16]  W. Bruce Croft,et al.  Cluster-based retrieval using language models , 2004, SIGIR '04.

[17]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[18]  Stephen E. Robertson,et al.  GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[19]  John D. Lafferty,et al.  Model-based feedback in the language modeling approach to information retrieval , 2001, CIKM '01.

[20]  W. Bruce Croft,et al.  Improving the effectiveness of information retrieval with local context analysis , 2000, TOIS.

[21]  Thorsten Joachims,et al.  A support vector method for multivariate performance measures , 2005, ICML.

[22]  Joel L. Fagan,et al.  Automatic Phrase Indexing for Document Retrieval: An Examination of Syntactic and Non-Syntactic Methods , 1987, SIGIR.

[23]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[24]  Charles L. A. Clarke,et al.  Shortest-substring retrieval and ranking , 2000, TOIS.

[25]  Yun Zhou,et al.  Indri at TREC 2005: Terabyte Track (Notebook Version) , 2005 .

[26]  Tao Tao,et al.  Language Model Information Retrieval with Document Expansion , 2006, NAACL.

[27]  W. Bruce Croft,et al.  The use of phrases and structured queries in information retrieval , 1991, SIGIR '91.

[28]  W. Bruce Croft,et al.  Linear feature-based models for information retrieval , 2007, Information Retrieval.

[29]  Kevyn Collins-Thompson,et al.  Query expansion using random walk models , 2005, CIKM '05.