Task based evaluation of exploratory search systems

Evaluation of interactive search systems has always been time-consuming and complex, which probably explains the relative low level of interest from IR researchers for this type of evaluation in the past. Yet the limitations of batch-style system evaluations cannot be ignored anymore. We present some case studies of evaluations in interactive settings. Several of these evaluations oer valuable new insights about system adequacy. This more than compensates for the reduced ability to reproduce results. We distinguish system centered evaluations focusing on performance and user centered (task based) evaluations focusing on adequacy. The latter take the natural task of a user as starting point. Task based evaluations suggest that proper HCI design is probably a more important factor for user satisfaction than the quality of statistical indexing and ranking methods. User centered and system centered evaluations of interactive systems measure dierent aspects of quality. The challenge is to design an evaluation where the dierent components that determine system adequacy and performance can be identified and their relationship can be quantified.

[1]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[2]  Dirk Heylen,et al.  Annotating and measuring meeting behavior , 2005 .

[3]  Gary Marchionini,et al.  Exploratory search , 2006, Commun. ACM.

[4]  F. D. Jong Novalist : Content Reduction for Cross-media Browsing , 2005 .

[5]  Kalervo Järvelin,et al.  Task complexity affects information seeking and use , 1995 .

[6]  Paul B. Kantor,et al.  A study of information seeking and retrieving. I. Background and methodology , 1997, J. Am. Soc. Inf. Sci..

[7]  Dennis Koelma,et al.  The MediaMill TRECVID 2008 Semantic Video Search Engine , 2008, TRECVID.

[8]  Wessel Kraaij,et al.  Unsupervised Event Clustering in Multilingual News Streams , 2002 .

[9]  Wessel Kraaij,et al.  Headline extraction based on a combination of uni- and multidocument summarization techniques , 2002 .

[10]  Mirjam Huis in 't Veld,et al.  Evaluating meeting support tools , 2008, Personal and Ubiquitous Computing.

[11]  Jun Yang,et al.  CMU Informedia's TRECVID 2005 Skirmishes , 2005, TRECVID.

[12]  Steve Whittaker,et al.  A meeting browser evaluation test , 2005, CHI Extended Abstracts.

[13]  Janet M. Corrigan,et al.  Background and Methodology , 2000 .

[14]  José Luis Vicedo González,et al.  TREC: Experiment and evaluation in information retrieval , 2007, J. Assoc. Inf. Sci. Technol..

[15]  JärvelinKalervo,et al.  Task complexity affects information seeking and use , 1995 .

[16]  Gobinda G. Chowdhury,et al.  TREC: Experiment and Evaluation in Information Retrieval , 2007 .

[17]  Paul Over,et al.  TRECVID 2005 - An Overview , 2005, TRECVID.