Ranking using multiple document types in desktop search

A typical desktop environment contains many document types (email, presentations, web pages, pdfs, etc.) each with different metadata. Predicting which types of documents a user is looking for in the context of a given query is a crucial part of providing effective desktop search. The problem is similar to selecting resources in distributed IR, but there are some important differences. In this paper, we quantify the impact of type prediction in producing a merged ranking for desktop search and introduce a new prediction method that exploits type-specific metadata. In addition, we show that type prediction performance and search effectiveness can be further enhanced by combining existing methods of type prediction using discriminative learning models. Our experiments employ pseudo-desktop collections and a human computation game for acquiring realistic and reusable queries.

[1]  Fernando Diaz,et al.  Sources of evidence for vertical selection , 2009, SIGIR.

[2]  W. Bruce Croft,et al.  Blog site search using resource selection , 2008, CIKM '08.

[3]  James P. Callan,et al.  Combining document representations for known-item search , 2003, SIGIR.

[4]  Wolfgang Nejdl,et al.  Building a Desktop Search Test-Bed , 2007, ECIR.

[5]  W. Bruce Croft,et al.  Searching distributed collections with inference networks , 1995, SIGIR '95.

[6]  Luo Si,et al.  A language modeling framework for resource selection and results merging , 2002, CIKM '02.

[7]  W. Bruce Croft,et al.  A Probabilistic Retrieval Model for Semistructured Data , 2009, ECIR.

[8]  David Hawking,et al.  Server selection methods in personal metasearch: a comparative empirical study , 2009, Information Retrieval.

[9]  W. Bruce Croft,et al.  Predicting query performance , 2002, SIGIR '02.

[10]  Fernando Diaz,et al.  Classification-based resource selection , 2009, CIKM.

[11]  Stephen E. Robertson,et al.  Simple BM25 extension to multiple weighted fields , 2004, CIKM '04.

[12]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[13]  M. de Rijke,et al.  Building simulated queries for known-item topics: an analysis using six european languages , 2007, SIGIR.

[14]  David Elsweiler,et al.  Towards task-based personal information management evaluations , 2007, SIGIR.

[15]  Laura A. Dabbish,et al.  Designing games with a purpose , 2008, CACM.

[16]  William H. Press,et al.  Numerical recipes in C , 2002 .

[17]  Raman Chandrasekar,et al.  Improving search engines using human computation games , 2009, CIKM.

[18]  W. Bruce Croft,et al.  Retrieval experiments using pseudo-desktop collections , 2009, CIKM.