Effects of position and number of relevant documents retrieved on users' evaluations of system performance

Information retrieval research has demonstrated that system performance does not always correlate positively with user performance, and that users often assign positive evaluation scores to search systems even when they are unable to complete tasks successfully. This research investigated the relationship between objective measures of system performance and users' perceptions of that performance. In this study, subjects evaluated the performance of four search systems whose search results were manipulated systematically to produce different orderings and numbers of relevant documents. Three laboratory studies were conducted with a total of eighty-one subjects. The first two studies investigated the effect of the order of five relevant and five nonrelevant documents in a search results list containing ten results on subjects' evaluations. The third study investigated the effect of varying the number of relevant documents in a search results list containing ten results on subjects' evaluations. Results demonstrate linear relationships between subjects' evaluations and the position of relevant documents in a search results list and the total number of relevant documents retrieved. Of the two, number of relevant documents retrieved was a stronger predictor of subjects' evaluation ratings and resulted in subjects using a greater range of evaluation scores.

[1]  Amanda Spink,et al.  Web Search: Public Searching of the Web , 2011, Information Science and Knowledge Management.

[2]  A. Spink,et al.  Web Search: Public Searching of the Web (Information Science and Knowledge Management) , 2005 .

[3]  Kasper Hornbæk,et al.  Meta-analysis of correlations among usability measures , 2007, CHI.

[4]  Falk Scholer,et al.  User performance versus precision measures for simple search tasks , 2006, SIGIR.

[5]  James Allan,et al.  HARD Track Overview in TREC 2003: High Accuracy Retrieval from Documents , 2003, TREC.

[6]  Amanda Spink,et al.  A user-centered approach to evaluating human interaction with Web search engines: an exploratory study , 2002, Inf. Process. Manag..

[7]  Pia Borlund,et al.  The concept of relevance in IR , 2003, J. Assoc. Inf. Sci. Technol..

[8]  Elaine Toms,et al.  WiIRE: the Web interactive information retrieval experimentation system prototype , 2004, Inf. Process. Manag..

[9]  P. Lachenbruch Statistical Power Analysis for the Behavioral Sciences (2nd ed.) , 1989 .

[10]  Nicholas J. Belkin,et al.  The TREC Interactive Tracks: Putting the User into Search , 2005 .

[11]  Jakob Nielsen,et al.  Measuring usability: preference vs. performance , 1994, CACM.

[12]  Filip Radlinski,et al.  Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search , 2007, TOIS.

[13]  M. E. Maron,et al.  An evaluation of retrieval effectiveness for a full-text document-retrieval system , 1985, CACM.

[14]  Mark Sanderson,et al.  The relationship between IR effectiveness measures and user satisfaction , 2007, SIGIR.

[15]  David Hawking,et al.  Evaluation by comparing result sets in context , 2006, CIKM '06.

[16]  Nicholas J. Belkin,et al.  Rutgers Information Retrieval Evaluation Project on IR Performance on Different Precision Levels , 2006 .

[17]  Edward Cutrell,et al.  What are you looking for?: an eye-tracking study of information usage in web search , 2007, CHI.

[18]  Andrew Turpin,et al.  Do batch and user evaluations give the same results? , 2000, SIGIR '00.

[19]  Pia Borlund,et al.  The IIR evaluation model: a framework for evaluation of interactive information retrieval systems , 2003, Inf. Res..

[20]  Judit Bar-Ilan,et al.  User rankings of search engine results , 2007, J. Assoc. Inf. Sci. Technol..

[21]  Ellen M. Voorhees,et al.  TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing) , 2005 .

[22]  Andrew Turpin,et al.  Why batch and user evaluations do not give the same results , 2001, SIGIR '01.

[23]  Chirag Shah,et al.  Effects of performance feedback on users' evaluations of an interactive IR system , 2008, IIiX.

[24]  José Luis Vicedo González,et al.  TREC: Experiment and evaluation in information retrieval , 2007, J. Assoc. Inf. Sci. Technol..

[25]  Steve Fox,et al.  Evaluating implicit measures to improve web search , 2005, TOIS.

[26]  Scott B. Huffman,et al.  How well does result relevance predict session satisfaction? , 2007, SIGIR.

[27]  Susan T. Dumais,et al.  Bringing order to the Web: automatically categorizing search results , 2000, CHI.

[28]  Louise T. Su A comprehensive and systematic model of user evaluation of Web search engines: II. An evaluation by undergraduates , 2003, J. Assoc. Inf. Sci. Technol..

[29]  Mika Käki,et al.  Controlling the complexity in comparing search user interfaces via user studies , 2008, Information Processing & Management.

[30]  James Allan,et al.  When will information retrieval be "good enough"? , 2005, SIGIR '05.