The Unfair Externalities of Exploration

Introduction. Online learning algorithms are a key tool in web search and content optimization, adaptively learning what users want to see. In a typical application, each time a user arrives, the algorithm chooses among various content presentation options (e.g., news articles to display), the chosen content is presented to the user, and an outcome (e.g., a click) is observed. Such algorithms must balance exploration (making potentially suboptimal decisions for the sake of acquiring information) and exploitation (using this information to make better decisions) [3]. Exploration could degrade the experience of a current user, but improves user experience in the long run. Concerns have been raised about whether exploration in such scenarios could be unfair to some population groups, in the sense that some groups may experience too much of the downside of exploration without sufficient upside [2]. We initialize a formal study of this issue, continuing an active line of work on unfairness and bias in machine learning [4, 5, 7, 8, 11]. Our work differs from the line of research on meritocratic fairness in online learning [9, 10, 14], which considers the allocation of limited resources such as bank loans and requires that nobody should be passed over in favor of a less qualified applicant. We study a fundamentally different scenario in which there are no allocation constraints and we would like to serve each user the best content possible.