Effects of Rank and Precision of Search Results on Users ’ Evaluations of System Performance

Previous research has demonstrated that system performance does not always correlate positively with user performance, and that users often assign positive evaluation scores to systems even when they are unable to complete tasks successfully. This paper investigates the relationship between actual system performance and users’ perceptions of system performance by manipulating the level of performance experienced by users and measuring users’ evaluations of system performance. Eighty-one subjects participated in one of three laboratory studies. The first two studies investigated the impact of the location (or rank order) of five relevant and five non-relevant documents in a search results list containing ten results. The third study investigated the impact of varying levels of precision (.30, .40, .50 and .60) of a search results list containing ten results. Results demonstrate statistically significant relationships between precision and subjects’ evaluations of system performance, and ranking and subjects’ evaluations of system performance. Of the two, precision explained more variance in subjects’ evaluation ratings and was a stronger predictor of subjects’ ratings. Finally, the number of documents subjects examined significantly influenced their evaluations, even when the difference was a single document.

[1]  Pia Borlund,et al.  The IIR evaluation model: a framework for evaluation of interactive information retrieval systems , 2003, Inf. Res..

[2]  David Hawking,et al.  Evaluation by comparing result sets in context , 2006, CIKM '06.

[3]  Elaine Toms,et al.  WiIRE: the Web interactive information retrieval experimentation system prototype , 2004, Inf. Process. Manag..

[4]  Louise T. Su A comprehensive and systematic model of user evaluation of Web search engines: II. An evaluation by undergraduates , 2003, J. Assoc. Inf. Sci. Technol..

[5]  Andrew Turpin,et al.  Why batch and user evaluations do not give the same results , 2001, SIGIR '01.

[6]  Thorsten Joachims,et al.  Accurately interpreting clickthrough data as implicit feedback , 2005, SIGIR '05.

[7]  Amanda Spink,et al.  A user-centered approach to evaluating human interaction with Web search engines: an exploratory study , 2002, Inf. Process. Manag..

[8]  James Allan,et al.  HARD Track Overview in TREC 2003: High Accuracy Retrieval from Documents , 2003, TREC.

[9]  Andrew Turpin,et al.  Do batch and user evaluations give the same results? , 2000, SIGIR '00.

[10]  Falk Scholer,et al.  User performance versus precision measures for simple search tasks , 2006, SIGIR.

[11]  Amanda Spink,et al.  Web Search: Public Searching of the Web , 2011, Information Science and Knowledge Management.

[12]  James Allan,et al.  When will information retrieval be "good enough"? , 2005, SIGIR '05.

[13]  Nicholas J. Belkin,et al.  The TREC Interactive Tracks: Putting the User into Search , 2005 .