Untangling Result List Refinement and Ranking Quality: a Framework for Evaluation and Prediction

Traditional batch evaluation metrics assume that user interaction with search results is limited to scanning down a ranked list. However, modern search interfaces come with additional elements supporting result list refinement (RLR) through facets and filters, making user search behavior increasingly dynamic. We develop an evaluation framework that takes a step beyond the interaction assumption of traditional evaluation metrics and allows for batch evaluation of systems with and without RLR elements. In our framework we model user interaction as switching between different sublists. This provides a measure of user effort based on the joint effect of user interaction with RLR elements and result quality. We validate our framework by conducting a user study and comparing model predictions with real user performance. Our model predictions show significant positive correlation with real user effort. Further, in contrast to traditional evaluation metrics, the predictions using our framework, of when users stand to benefit from RLR elements, reflect findings from our user study. Finally, we use the framework to investigate under what conditions systems with and without RLR elements are likely to be effective. We simulate varying conditions concerning ranking quality, users, task and interface properties demonstrating a cost-effective way to study whole system performance.

[1]  M. de Rijke,et al.  Click Models for Web Search , 2015, Click Models for Web Search.

[2]  Gobinda G. Chowdhury,et al.  TREC: Experiment and Evaluation in Information Retrieval , 2007 .

[3]  Olivier Chapelle,et al.  A dynamic bayesian network click model for web search ranking , 2009, WWW '09.

[4]  Leif Azzopardi,et al.  Studying user browsing behavior through gamified search tasks , 2014, GamifIR '14.

[5]  M. de Rijke,et al.  Click model-based information retrieval metrics , 2013, SIGIR.

[6]  Nick Craswell,et al.  An experimental comparison of click position-bias models , 2008, WSDM '08.

[7]  Bohdan S. Wynar,et al.  Introduction to Cataloging and Classification , 1991 .

[8]  Maarten Marx,et al.  Evaluation Methods for Rankings of Facetvalues for Faceted Search , 2011, CLEF.

[9]  Djoerd Hiemstra,et al.  Overview of the TREC 2014 Federated Web Search Track , 2013, TREC.

[10]  Ram L. Kumar,et al.  User interface features influencing overall ease of use and personalization , 2004, Inf. Manag..

[11]  Ryen W. White,et al.  No clicks, no problem: using cursor movements to understand and improve search , 2011, CHI.

[12]  Ben Carterette,et al.  Evaluating multi-query sessions , 2011, SIGIR.

[13]  Leif Azzopardi,et al.  Modelling interaction with economic models of search , 2014, SIGIR.

[14]  Uzay Kaymak,et al.  Facet selection algorithms for web product search , 2013, CIKM.

[15]  Maarten de Rijke,et al.  Aggregated search interface preferences in multi-session search tasks , 2013, SIGIR.

[16]  Norbert Fuhr,et al.  A probability ranking principle for interactive information retrieval , 2008, Information Retrieval.

[17]  Ellen M. Voorhees,et al.  TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing) , 2005 .

[18]  Yi Zhang,et al.  Personalized interactive faceted search , 2008, WWW.

[19]  Benjamin Piwowarski,et al.  A user browsing model to predict search engine click data from past observations. , 2008, SIGIR '08.

[20]  Gautam Das,et al.  Facetedpedia: dynamic generation of query-dependent faceted interfaces for wikipedia , 2010, WWW '10.

[21]  Marti A. Hearst,et al.  Flexible Search and Navigation using Faceted Metadata , 2002 .

[22]  Olivier Chapelle,et al.  Expected reciprocal rank for graded relevance , 2009, CIKM.

[23]  Pertti Vakkari,et al.  Evaluation Methodologies in Information Retrieval (Dagstuhl Seminar 13441) , 2013, Dagstuhl Reports.

[24]  José Luis Vicedo González,et al.  TREC: Experiment and evaluation in information retrieval , 2007, J. Assoc. Inf. Sci. Technol..

[25]  Charles L. A. Clarke,et al.  Stochastic simulation of time-biased gain , 2012, CIKM '12.

[26]  Thorsten Joachims,et al.  Accurately interpreting clickthrough data as implicit feedback , 2005, SIGIR '05.

[27]  Lois M. L. Delcambre,et al.  Discounted Cumulated Gain Based Evaluation of Multiple-Query IR Sessions , 2008, ECIR.

[28]  Alistair Moffat,et al.  Rank-biased precision for measurement of retrieval effectiveness , 2008, TOIS.

[29]  Arjen P. de Vries,et al.  Characterizing stages of a multi-session complex search task through direct and indirect query modifications , 2013, SIGIR.

[30]  Doug Downey,et al.  Models of Searching and Browsing: Languages, Studies, and Application , 2007, IJCAI.

[31]  Ben Carterette,et al.  System effectiveness, user models, and user utility: a conceptual framework for investigation , 2011, SIGIR.

[32]  James Allan,et al.  Extending Faceted Search to the General Web , 2014, CIKM.

[33]  Chao Liu,et al.  Efficient multiple-click models in web search , 2009, WSDM '09.

[34]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[35]  Vagelis Hristidis,et al.  FACeTOR: cost-driven exploration of faceted query results , 2010, CIKM.