Exploring Topic-based Language Models for Effective Web Information Retrieval

The main obstacle for providing focused search is the relative opaqueness of search request -- searchers tend to express their complex information needs in only a couple of keywords. Our overall aim is to find out if, and how, topic-based language models can lead to more effective web information retrieval. In this paper we explore retrieval performance of a topic-based model that combines topical models with other language models based on cross-entropy. We first define our topical categories and train our topical models on the .GOV2 corpus by building parsimonious language models. We then test the topic-based model on TREC8 small Web data collection for ad-hoc search.Our experimental results show that the topic-based model outperforms the standard language model and parsimonious model.

[1]  W. Bruce Croft,et al.  Cluster-based language models for distributed retrieval , 1999, SIGIR '99.

[2]  ChengXiang Zhai,et al.  Implicit user modeling for personalized search , 2005, CIKM '05.

[3]  John D. Lafferty,et al.  Model-based feedback in the language modeling approach to information retrieval , 2001, CIKM '01.

[4]  L. Azzopardi,et al.  Topic based language models for ad hoc information retrieval , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[5]  John D. Lafferty,et al.  Information retrieval as statistical translation , 1999, SIGIR '99.

[6]  Jian-Yun Nie,et al.  Using query contexts in information retrieval , 2007, SIGIR.

[7]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[8]  Clement T. Yu,et al.  Personalized web search by mapping user queries to categories , 2002, CIKM '02.

[9]  Susan T. Dumais,et al.  Personalizing Search via Automated Analysis of Interests and Activities , 2005, SIGIR.

[10]  Jürgen Umbrich,et al.  SWSE: Answers Before Links! , 2007, Semantic Web Challenge.

[11]  Djoerd Hiemstra,et al.  Conceptual Language Models for Context-Aware Text Retrieval , 2004, TREC.

[12]  Víctor Pàmies,et al.  Open Directory Project , 2003 .

[13]  W. Bruce Croft,et al.  A general language model for information retrieval , 1999, CIKM '99.

[14]  Richard M. Schwartz,et al.  A hidden Markov model information retrieval system , 1999, SIGIR '99.

[15]  Djoerd Hiemstra,et al.  Twenty-One at TREC7: Ad-hoc and Cross-Language Track , 1998, TREC.

[16]  Djoerd Hiemstra,et al.  Parsimonious language models for information retrieval , 2004, SIGIR '04.

[17]  W. Bruce Croft,et al.  Cluster-based retrieval using language models , 2004, SIGIR '04.

[18]  W. Bruce Croft,et al.  Relevance Models in Information Retrieval , 2003 .

[19]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[20]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[21]  Wolfgang Nejdl,et al.  Using ODP metadata to personalize search , 2005, SIGIR '05.

[22]  W. Bruce Croft,et al.  Investigating Retrieval Performance with Manually-Built Topic Models , 2007, RIAO.