Parsimonious Relevance and Concept Models

We describe our participation in the CLEF 2008 Domain Specific track. The research questions we address are threefold: (i) what are the effects of estimating and applying relevance models to the domain specific collection used at CLEF 2008, (ii) what are the results of parsimonizing these relevance models, and (iii) what are the results of applying concept models for blind relevance feedback? Parsimonization is a technique by which the term probabilities in a language model may be re-estimated based on a comparison with a reference model, making the resulting model more sparse and to the point. Concept models are term distributions over vocabulary terms, based on the language associated with concepts in a thesaurus or ontology and are estimated using the documents which are annotated with concepts. Concept models may be used for blind relevance feedback, by first translating a query to concepts and then back to query terms. We find that applying relevance models helps significantly for the current test collection, in terms of both mean average precision and early precision. Moreover, parsimonizing the relevance models helps mean average precision on title-only queries and early precision on title+narrative queries. Our concept models are able to significantly outperform a baseline query-likelihood run, both in terms of mean average precision and early precision on both title-only and title+narrative queries.

[1]  Dolf Trieschnigg,et al.  Parsimonious concept modeling , 2008, SIGIR '08.

[2]  CHENGXIANG ZHAI,et al.  A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.

[3]  ChengXiang Zhai,et al.  Risk minimization and language modeling in text retrieval dissertation abstract , 2002, SIGF.

[4]  Djoerd Hiemstra,et al.  A Linguistically Motivated Probabilistic Model of Information Retrieval , 1998, ECDL.

[5]  W. Bruce Croft,et al.  Query expansion using local and global document analysis , 1996, SIGIR '96.

[6]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[7]  Wessel Kraaij,et al.  Transitive probabilistic CLIR models , 2004 .

[8]  John D. Lafferty,et al.  Model-based feedback in the language modeling approach to information retrieval , 2001, CIKM '01.

[9]  Carmel Domshlak,et al.  Better than the real thing?: iterative pseudo-query processing using cluster-based language models , 2005, SIGIR '05.

[10]  Marti A. Hearst,et al.  TREC 2007 Genomics Track Overview , 2007, TREC.

[11]  Diego Reforgiato Recupero,et al.  A new unsupervised method for document clustering by using WordNet lexical and conceptual relations , 2007, Information Retrieval.

[12]  Dolf Trieschnigg,et al.  Measuring concept relatedness using language models , 2008, SIGIR '08.

[13]  Maarten de Rijke,et al.  Thesaurus-Based Feedback to Support Mixed Search and Browsing Environments , 2007, ECDL.

[14]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[15]  M. de Rijke,et al.  Parsimonious relevance models , 2008, SIGIR '08.

[16]  Peter G. Anick Using terminological feedback for web search refinement: a log-based study , 2003, SIGIR.

[17]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[18]  Djoerd Hiemstra,et al.  Parsimonious language models for information retrieval , 2004, SIGIR '04.