On caption bias in interleaving experiments

Information retrieval evaluation most often involves manually assessing the relevance of particular query-document pairs. In cases where this is difficult (such as personalized search), interleaved comparison methods are becoming increasingly common. These methods compare pairs of ranking functions based on user clicks on search results, thus better reflecting true user preferences. However, by depending on clicks, there is a potential for bias. For example, users have been previously shown to be more likely to click on results with attractive titles and snippets. An interleaving evaluation where one ranker tends to generate results that attract more clicks (without being more relevant) may thus be biased. We present an approach for detecting and compensating for this type of bias in interleaving evaluations. Introducing a new model of caption bias, we propose features that model bias based on (1) per-document effects, and (2) the (pairwise) relationships between a document and surrounding documents. We show that our model can effectively capture click behavior, with best results achieved by a model that combines both per-document and pairwise features. Applying this model to re-weight observed user clicks, we find a small overall effect on real interleaving comparisons, but also identify a case where initially detected preferences vanish after caption bias re-weighting is applied. Our results indicate that our model of caption bias is effective and can successfully identify interleaving experiments affected by caption bias.

[1]  Mark Sanderson,et al.  Test Collection Based Evaluation of Information Retrieval Systems , 2010, Found. Trends Inf. Retr..

[2]  Filip Radlinski,et al.  Large-scale validation and analysis of interleaved search evaluation , 2012, TOIS.

[3]  Christos Faloutsos,et al.  Tailoring click models to user goals , 2009, WSCD '09.

[4]  Zheng Chen,et al.  A novel click model and its applications to online advertising , 2010, WSDM '10.

[5]  Benjamin Piwowarski,et al.  A user browsing model to predict search engine click data from past observations. , 2008, SIGIR '08.

[6]  Chao Liu,et al.  Efficient multiple-click models in web search , 2009, WSDM '09.

[7]  Filip Radlinski,et al.  Comparing the sensitivity of information retrieval metrics , 2010, SIGIR.

[8]  Thorsten Joachims,et al.  Eye-tracking analysis of user behavior in WWW search , 2004, SIGIR '04.

[9]  Jaime Teevan,et al.  Implicit feedback for inferring user preference: a bibliography , 2003, SIGF.

[10]  Andrew Trotman,et al.  Comparative analysis of clicks and judgments for IR evaluation , 2009, WSCD '09.

[11]  Edward Cutrell,et al.  What are you looking for?: an eye-tracking study of information usage in web search , 2007, CHI.

[12]  Xiaojie Yuan,et al.  Are click-through data adequate for learning web search rankings? , 2008, CIKM '08.

[13]  Ben Carterette,et al.  Evaluating Search Engines by Modeling the Relationship Between Relevance and Clicks , 2007, NIPS.

[14]  Susan T. Dumais,et al.  Improving Web Search Ranking by Incorporating User Behavior Information , 2019, SIGIR Forum.

[15]  Kuansan Wang,et al.  PSkip: estimating relevance ranking quality from web search clickthrough data , 2009, KDD.

[16]  Yuchen Zhang,et al.  Incorporating post-click behaviors into a click model , 2010, SIGIR.

[17]  Filip Radlinski,et al.  Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search , 2007, TOIS.

[18]  Xuehua Shen,et al.  Context-sensitive information retrieval using implicit feedback , 2005, SIGIR '05.

[19]  José Luis Vicedo González,et al.  TREC: Experiment and evaluation in information retrieval , 2007, J. Assoc. Inf. Sci. Technol..

[20]  Yisong Yue,et al.  Beyond position bias: examining result attractiveness as a source of presentation bias in clickthrough data , 2010, WWW '10.

[21]  Filip Radlinski,et al.  Minimally Invasive Randomization for Collecting Unbiased Preferences from Clickthrough Logs , 2006, AAAI 2006.

[22]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[23]  ChengXiang Zhai,et al.  Evaluation of methods for relative comparison of retrieval systems based on clickthroughs , 2009, CIKM.

[24]  Susan T. Dumais,et al.  Optimizing search by showing results in context , 2001, CHI.

[25]  Dayne Freitag,et al.  A Machine Learning Architecture for Optimizing Web Search Engines , 1999 .

[26]  Milad Shokouhi,et al.  Using Clicks as Implicit Judgments: Expectations Versus Observations , 2008, ECIR.

[27]  Nick Craswell,et al.  An experimental comparison of click position-bias models , 2008, WSDM '08.

[28]  Nina Mishra,et al.  Domain bias in web search , 2012, WSDM '12.

[29]  Ellen M. Voorhees,et al.  TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing) , 2005 .

[30]  Edward Cutrell,et al.  An eye tracking study of the effect of target rank on web search , 2007, CHI.

[31]  Filip Radlinski,et al.  How does clickthrough data reflect retrieval quality? , 2008, CIKM '08.

[32]  Charles L. A. Clarke,et al.  The influence of caption features on clickthrough patterns in web search , 2007, SIGIR.

[33]  Katja Hofmann,et al.  Comparing click-through data to purchase decisions for retrieval evaluation , 2010, SIGIR '10.

[34]  Katja Hofmann,et al.  A probabilistic method for inferring preferences from clicks , 2011, CIKM '11.

[35]  Mark Sanderson,et al.  Advantages of query biased summaries in information retrieval , 1998, SIGIR '98.