Understanding and Predicting Graded Search Satisfaction

Understanding and estimating satisfaction with search engines is an important aspect of evaluating retrieval performance. Research to date has modeled and predicted search satisfaction on a binary scale, i.e., the searchers are either satisfied or dissatisfied with their search outcome. However, users' search experience is a complex construct and there are different degrees of satisfaction. As such, binary classification of satisfaction may be limiting. To the best of our knowledge, we are the first to study the problem of understanding and predicting graded (multi-level) search satisfaction. We ex-amine sessions mined from search engine logs, where searcher satisfaction was also assessed on multi-point scale by human annotators. Leveraging these search log data, we observe rich and non-monotonous changes in search behavior in sessions with different degrees of satisfaction. The findings suggest that we should predict finer-grained satisfaction levels. To address this issue, we model search satisfaction using features indicating search outcome, search effort, and changes in both outcome and effort during a session. We show that our approach can predict subtle changes in search satisfaction more accurately than state-of-the-art methods, affording greater insight into search satisfaction. The strong performance of our models has implications for search providers seeking to accu-rately measure satisfaction with their services.

[1]  Nick Craswell,et al.  Beyond clicks: query reformulation as a predictor of search satisfaction , 2013, CIKM.

[2]  Steve Fox,et al.  Evaluating implicit measures to improve web search , 2005, TOIS.

[3]  Fatemeh Zahedi,et al.  The Measurement of Web-Customer Satisfaction: An Expectation and Disconfirmation Approach , 2002, Inf. Syst. Res..

[4]  Jaime Arguello Predicting Search Task Difficulty , 2014, ECIR.

[5]  Catherine L. Smith,et al.  User adaptation: good results from poor systems , 2008, SIGIR '08.

[6]  Rosie Jones,et al.  Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs , 2008, CIKM '08.

[7]  Ryen W. White,et al.  Modeling dwell time to predict click-level satisfaction , 2014, WSDM.

[8]  Ahmed Hassan Awadallah,et al.  A semi-supervised approach to modeling web search satisfaction , 2012, SIGIR '12.

[9]  Leif Azzopardi,et al.  Modelling interaction with economic models of search , 2014, SIGIR.

[10]  Thorsten Joachims,et al.  The influence of task and gender on search and evaluation behavior using Google , 2006, Inf. Process. Manag..

[11]  Filip Radlinski,et al.  Relevance and Effort: An Analysis of Document Utility , 2014, CIKM.

[12]  Jaana Kekäläinen,et al.  IR evaluation methods for retrieving highly relevant documents , 2000, SIGIR '00.

[13]  Nicholas J. Belkin,et al.  Exploring and predicting search task difficulty , 2012, CIKM '12.

[14]  Diane Kelly,et al.  Methods for Evaluating Interactive Information Retrieval Systems with Users , 2009, Found. Trends Inf. Retr..

[15]  Wei Chu,et al.  Modeling the impact of short- and long-term behavior on search personalization , 2012, SIGIR '12.

[16]  R. Oliver A Cognitive Model of the Antecedents and Consequences of Satisfaction Decisions , 1980 .

[17]  T SuLouise A comprehensive and systematic model of user evaluation of web search engines , 2003 .

[18]  Sammy W. Pearson,et al.  Development of a Tool for Measuring and Analyzing Computer User Satisfaction , 1983 .

[19]  Eugene Agichtein,et al.  Find it if you can: a game for modeling different types of web search success using interaction data , 2011, SIGIR.

[20]  Ben Carterette,et al.  Evaluating multi-query sessions , 2011, SIGIR.

[21]  Daqing He,et al.  Searching, browsing, and clicking in a search session: changes in user behavior by task and over time , 2014, SIGIR.

[22]  Louise T. Su A comprehensive and systematic model of user evaluation of Web search engines: I. Theory and background , 2003, J. Assoc. Inf. Sci. Technol..

[23]  Ryen W. White,et al.  Struggling or exploring?: disambiguating long search sessions , 2014, WSDM.

[24]  Leif Azzopardi,et al.  How query cost affects search behavior , 2013, SIGIR.

[25]  Mark Sanderson,et al.  The relationship between IR effectiveness measures and user satisfaction , 2007, SIGIR.

[26]  Blake Ives,et al.  The measurement of user information satisfaction , 1983, CACM.

[27]  Ryen W. White Beliefs and biases in web search , 2013, SIGIR.

[28]  Ryen W. White,et al.  Why searchers switch: understanding and predicting engine switching rationales , 2011, SIGIR.

[29]  Jingjing Liu,et al.  Personalizing information retrieval for multi‐session tasks: Examining the roles of task stage, task type, and topic knowledge on the interpretation of dwell time as an indicator of document usefulness , 2015, J. Assoc. Inf. Sci. Technol..

[30]  Ron Kohavi,et al.  Controlled experiments on the web: survey and practical guide , 2009, Data Mining and Knowledge Discovery.

[31]  Ahmed Hassan Awadallah,et al.  Beyond DCG: user behavior as a predictor of a successful search , 2010, WSDM '10.

[32]  Scott B. Huffman,et al.  How well does result relevance predict session satisfaction? , 2007, SIGIR.

[33]  Yang Song,et al.  Modeling action-level satisfaction for search task satisfaction prediction , 2014, SIGIR.

[34]  Thorsten Joachims,et al.  Accurately interpreting clickthrough data as implicit feedback , 2005, SIGIR '05.

[35]  Lois M. L. Delcambre,et al.  Discounted Cumulated Gain Based Evaluation of Multiple-Query IR Sessions , 2008, ECIR.

[36]  James Allan,et al.  Predicting searcher frustration , 2010, SIGIR.