User Behavior and Bias in Click-Based Recommender Evaluation

Measuring the quality of recommendations produced by a recommender system (RS) is challenging. Labels used for the evaluation are typically obtained from users of a RS; such explicit labels reflect true user preferences but may introduce significant biases in the evaluation process. In this paper, we investigate biases that may affect labels inferred from implicit feedback, such as clicks or other user interactions. Implicit feedback is easy to collect as it is a side product of users’ natural interactions, but can be particularly prone to biases, such as position bias. We examine this bias using click models that were developed in the information retrieval community, and show how bias following these models would affect the outcomes of RS evaluation. We find that evaluation based on implicit and explicit feedback can agree well, but only when the evaluation metrics are designed to take user behavior and preferences into account. Our results highlight the importance of understanding user behavior in deployed RS.