New measures for the evaluation of interactive information retrieval systems: Normalized task completion time and normalized user effectiveness

User satisfaction, though difficult to measure, is the main goal of Information Retrieval (IR) systems. In recent years, as Interactive Information Retrieval (IIR) systems have become increasingly popular, user effectiveness also has become critical in evaluating IIR systems. However, existing measures in IR evaluation are not particularly suitable for gauging user satisfaction and user effectiveness. In this paper, we propose two new measures to evaluate IIR systems, the Normalized Task Completion Time (NT) and the Normalized User Effectiveness (NUE). The two measures overcome limitations of existing measures and are efficient to calculate in that they do not need a large pool of search tasks. A user study was conducted to investigate the relationships between the two measures and the user satisfaction and effectiveness of a given IR system. The learning effects described by NT, NUE, and the task completion time were also studied and compared. The results show that NT is strongly correlated with user satisfaction, NUE is a better indicator of system effectiveness than task completion time, and both new measures are superior to task completion time in describing the learning effect of the given IR system.

[1]  Stephen E. Robertson,et al.  Evaluating Interactive Systems in TREC , 1996, J. Am. Soc. Inf. Sci..

[2]  Carol Peters,et al.  Proceedings of the SIGIR 2009 Workshop on the Future of IR Evaluation , 2009 .

[3]  Peter Ingwersen,et al.  The development of a method for the evaluation of interactive information retrieval systems , 1997, J. Documentation.

[4]  Peter Willett,et al.  Readings in information retrieval , 1997 .

[5]  Louise T. Su Evaluation Measures for Interactive Information Retrieval , 1992, Inf. Process. Manag..

[6]  Marti A. Hearst Search User Interfaces , 2009 .

[7]  Mark D. Dunlop Time, relevance and interaction modelling for information retrieval , 1997, SIGIR '97.

[8]  Andrew Turpin,et al.  Why batch and user evaluations do not give the same results , 2001, SIGIR '01.

[9]  Ellen M. Voorhees Variations in relevance judgments and the measurement of retrieval effectiveness , 2000, Inf. Process. Manag..

[10]  T. P. Wright,et al.  Factors affecting the cost of airplanes , 1936 .

[11]  Bryce Allen,et al.  Perceptual speed, learning and information retrieval performance , 1994, SIGIR '94.

[12]  Mark Sanderson,et al.  A review of factors influencing user satisfaction in information retrieval , 2010 .

[13]  Pia Borlund,et al.  The IIR evaluation model: a framework for evaluation of interactive information retrieval systems , 2003, Inf. Res..

[14]  Nicholas J. Belkin,et al.  Some(what) grand challenges for information retrieval , 2008, SIGF.

[15]  Jacek Gwizdka,et al.  Revisiting search task difficulty: Behavioral and individual difference measures , 2008, ASIST.

[16]  Chirag Shah,et al.  Effects of position and number of relevant documents retrieved on users' evaluations of system performance , 2010, TOIS.

[17]  Diane Kelly,et al.  Methods for Evaluating Interactive Information Retrieval Systems with Users , 2009, Found. Trends Inf. Retr..

[18]  Donna K. Harman,et al.  Overview of the Sixth Text REtrieval Conference (TREC-6) , 1997, Inf. Process. Manag..

[19]  David Mease,et al.  Evaluating web search using task completion time , 2009, SIGIR.

[20]  William S. Cooper,et al.  On selecting a measure of retrieval effectiveness , 1973, J. Am. Soc. Inf. Sci..

[21]  Nicholas J. Belkin,et al.  Interaction in information systems : a review of research from document retrieval to knowledge-based systems , 1985 .

[22]  Cyril W. Cleverdon,et al.  Factors determining the performance of indexing systems , 1966 .