Web directories as topical context

In this paper we explore whether the Open Directory (or DMOZ) can be used to classify queries into topical categories on different levels and whether we can use this topical context to improve retrieval performance. We have set up a user study to let test persons explicitly classify queries into topical categories. Categories are either chosen freely from DMOZ, or from a list of suggestions created by several automatic topic categorization techniques. The results of this user study show that DMOZ categories are suitable for topic categorization. Either free search or evaluation of a list of suggestions can be used to elicit the topical context. Free search leads to more specific topic categories than the list of suggestions. Different test persons show moderate agreement between their individual judgments, but broadly agree on the initial levels of the chosen categories. When we use the topic categories selected by the free search as topical context, this leads to significant improvements over the baseline retrieval results. The more general topic categories selected from the suggestions list, and top level categories do not lead to significant improvements.

[1]  Susan Gauch,et al.  Improving Ontology-Based User Profiles , 2004, RIAO.

[2]  L. Azzopardi,et al.  Topic based language models for ad hoc information retrieval , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[3]  Charles L. A. Clarke,et al.  The TREC 2006 Terabyte Track , 2006, TREC.

[4]  Taher H. Haveliwala Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..

[5]  Djoerd Hiemstra,et al.  Parsimonious language models for information retrieval , 2004, SIGIR '04.

[6]  Jaap Kamps Effective Smoothing for a Terabyte of Text , 2005, TREC.

[7]  Víctor Pàmies,et al.  Open Directory Project , 2003 .

[8]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[9]  Djoerd Hiemstra,et al.  Using language models for information retrieval , 2001 .

[10]  Jian-Yun Nie,et al.  Using query contexts in information retrieval , 2007, SIGIR.

[11]  Clement T. Yu,et al.  Personalized web search by mapping user queries to categories , 2002, CIKM '02.

[12]  Wolfgang Nejdl,et al.  Using ODP metadata to personalize search , 2005, SIGIR '05.

[13]  W. Bruce Croft,et al.  Investigating Retrieval Performance with Manually-Built Topic Models , 2007, RIAO.

[14]  Djoerd Hiemstra,et al.  Experiments with positive, negative and topical relevance feedback , 2008 .

[15]  Mark A. Rosso User-based identification of Web genres , 2008, J. Assoc. Inf. Sci. Technol..

[16]  Djoerd Hiemstra,et al.  Exploring Topic-based Language Models for Effective Web Information Retrieval , 2008 .