Modeling Term Associations for Ad-Hoc Retrieval Performance Within Language Modeling Framework

Previous research has shown that using term associations could improve the effectiveness of information retrieval (IR) systems. However, most of the existing approaches focus on query reformulation. Document reformulation has just begun to be studied recently. In this paper, we study how to utilize term association measures to do document modeling, and what types of measures are effective in document language models. We propose a probabilistic term association measure, compare it to some traditional methods, such as the similarity co-efficient and window-based methods, in the language modeling (LM) framework, and show that significant improvements over query likelihood (QL) retrieval can be obtained. We also compare the method with state-of-the-art document modeling techniques based on latent mixture models.

[1]  W. Bruce Croft,et al.  LDA-based document models for ad-hoc retrieval , 2006, SIGIR.

[2]  W. Bruce Croft,et al.  Context-Based Topic Models for Query Modification , 2005 .

[3]  van Rijsbergen,et al.  Automatic Classification in Information Retrieval. , 1978 .

[4]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[5]  W. Bruce Croft,et al.  A language modeling approach to information retrieval , 1998, SIGIR '98.

[6]  W. Bruce Croft,et al.  Relevance-Based Language Models , 2001, SIGIR '01.

[7]  W. Bruce Croft,et al.  Query expansion using local and global document analysis , 1996, SIGIR '96.

[8]  Hans-Peter Frei,et al.  Concept based query expansion , 1993, SIGIR.

[9]  Karen Sparck Jones Automatic keyword classification for information retrieval , 1971 .

[10]  Curt Burgess,et al.  Explorations in context space: Words, sentences, discourse , 1998 .

[11]  Curt Burgess,et al.  Producing high-dimensional semantic spaces from lexical co-occurrence , 1996 .

[12]  Olga Vechtomova Introduction to Information Retrieval Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze, Cambridge University Press, 2008 , 2009, Comput. Linguistics.

[13]  W. Bruce Croft,et al.  Retrieving documents by plausible inference: An experimental study , 1989, Inf. Process. Manag..

[14]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[15]  W. Bruce Croft,et al.  Cluster-based retrieval using language models , 2004, SIGIR '04.

[16]  Jian-Yun Nie,et al.  Query expansion using term relationships in language models for information retrieval , 2005, CIKM '05.

[17]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[18]  Jian-Yun Nie,et al.  Constructing better document and query models with markov chains , 2006, CIKM '06.

[19]  ChengXiang Zhai,et al.  Semantic term matching in axiomatic approaches to information retrieval , 2006, SIGIR.

[20]  Jinxi Xu,et al.  Solving the word mismatch problem through automatic text analysis , 1997 .

[21]  John D. Lafferty,et al.  Information Retrieval as Statistical Translation , 2017 .

[22]  W. Bruce Croft,et al.  I3R: A new approach to the design of document retrieval systems , 1987, J. Am. Soc. Inf. Sci..

[23]  W. Bruce Croft,et al.  I 3 R: a new approach to the design of document retrieval systems , 1987 .

[24]  W. Bruce Croft,et al.  An Association Thesaurus for Information Retrieval , 1994, RIAO.

[25]  Jian-Yun Nie,et al.  Integrating word relationships into language models , 2005, SIGIR '05.

[26]  Gerard Salton,et al.  On the use of spreading activation methods in automatic information , 1988, SIGIR '88.

[27]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[28]  Changning Huang,et al.  Improving query translation for cross-language information retrieval using statistical models , 2001, SIGIR '01.