Adaptive Effort for Search Evaluation Metrics

We explain a wide range of search evaluation metrics as the ratio of users’ gain to effort for interacting with a ranked list of results. According to this explanation, many existing metrics measure users’ effort as linear to the (expected) number of examined results. This implicitly assumes that users spend the same effort to examine different results. We adapt current metrics to account for different effort on relevant and non-relevant documents. Results show that such adaptive effort metrics better correlate with and predict user perceptions on search quality.

[1]  Tetsuya Sakai,et al.  Summaries, ranked retrieval and sessions: a unified framework for information access evaluation , 2013, SIGIR.

[2]  Ben Carterette,et al.  System effectiveness, user models, and user utility: a conceptual framework for investigation , 2011, SIGIR.

[3]  Mark D. Dunlop Time, relevance and interaction modelling for information retrieval , 1997, SIGIR '97.

[4]  Gabriella Kazai,et al.  Tolerance to irrelevance: a user-effort oriented evaluation of retrieval systems without predefined retrieval unit , 2004 .

[5]  Olivier Chapelle,et al.  Expected reciprocal rank for graded relevance , 2009, CIKM.

[6]  Mark D. Smucker,et al.  Human performance and retrieval precision revisited , 2010, SIGIR.

[7]  Charles L. A. Clarke,et al.  Time-based calibration of effectiveness measures , 2012, SIGIR '12.

[8]  Nicholas J. Belkin,et al.  Display time as implicit feedback: understanding task effects , 2004, SIGIR '04.

[9]  Stephen E. Robertson,et al.  A new interpretation of average precision , 2008, SIGIR '08.

[10]  Martin Halvey,et al.  Is relevance hard work?: evaluating the effort of making relevant assessments , 2013, SIGIR.

[11]  Stephen E. Robertson,et al.  Extending average precision to graded relevance judgments , 2010, SIGIR.

[12]  W. S. Cooper Expected search length: A single measure of retrieval effectiveness based on the weak ordering action of retrieval systems , 1968 .

[13]  Ben Carterette,et al.  Evaluating multi-query sessions , 2011, SIGIR.

[14]  Alistair Moffat,et al.  Rank-biased precision for measurement of retrieval effectiveness , 2008, TOIS.

[15]  Ryen W. White,et al.  Understanding and Predicting Graded Search Satisfaction , 2015, WSDM.

[16]  Filip Radlinski,et al.  Relevance and Effort: An Analysis of Document Utility , 2014, CIKM.

[17]  Milad Shokouhi,et al.  Expected browsing utility for web search evaluation , 2010, CIKM.

[18]  Gabriella Kazai,et al.  eXtended cumulated gain measures for the evaluation of content-oriented XML retrieval , 2006, TOIS.

[19]  Lois M. L. Delcambre,et al.  Discounted Cumulated Gain Based Evaluation of Multiple-Query IR Sessions , 2008, ECIR.

[20]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[21]  Daqing He,et al.  Searching, browsing, and clicking in a search session: changes in user behavior by task and over time , 2014, SIGIR.