论文信息 - Modeling user variance in time-biased gain

Modeling user variance in time-biased gain

Cranfield-style information retrieval evaluation considers variance in user information needs by evaluating retrieval systems over a set of search topics. For each search topic, traditional metrics model all users searching ranked lists in exactly the same manner and thus have zero variance in their per-topic estimate of effectiveness. Metrics that fail to model user variance overestimate the effect size of differences between retrieval systems. The modeling of user variance is critical to understanding the impact of effectiveness differences on the actual user experience. If the variance of a difference is high, the effect on user experience will be low. Time-biased gain is an evaluation metric that models user interaction with ranked lists that are displayed using document surrogates. In this paper, we extend the stochastic simulation of time-biased gain to model the variation between users. We validate this new version of time-biased gain by showing that it produces distributions of gain that agree well with actual distributions produced by real users. With a per-topic variance in its effectiveness measure, time-biased gain allows for the measurement of the effect size of differences, which allows researchers to understand the extent to which predicted performance improvements matter to real users.

Charles L. A. Clarke | Mark D. Smucker

[1] Barry Smyth,et al. Predictive modeling of first-click behavior in web-search , 2006, WWW '06.

[2] Catherine L. Smith,et al. User adaptation: good results from poor systems , 2008, SIGIR '08.

[3] Leif Azzopardi,et al. The economics in interactive information retrieval , 2011, SIGIR.

[4] Charles L. A. Clarke,et al. Stochastic simulation of time-biased gain , 2012, CIKM '12.

[5] Charles L. A. Clarke,et al. A comparative analysis of cascade measures for novelty and diversity , 2011, WSDM '11.

[6] Gordon V. Cormack,et al. Statistical precision of information retrieval evaluation , 2006, SIGIR.

[7] Stephen E. Robertson,et al. A new interpretation of average precision , 2008, SIGIR '08.

[8] Ed H. Chi,et al. Using information scent to model user information needs and actions and the Web , 2001, CHI.

[9] Andrew Turpin,et al. Do batch and user evaluations give the same results? , 2000, SIGIR '00.

[10] Javed A. Aslam,et al. IR system evaluation using nugget-based test collections , 2012, WSDM '12.

[11] Mark D. Smucker,et al. Report on the SIGIR 2010 workshop on the simulation of interaction , 2011, SIGF.