Introduction. Online learning algorithms are a key tool in web search and content optimization, adaptively learning what users want to see. In a typical application, each time a user arrives, the algorithm chooses among various content presentation options (e.g., news articles to display), the chosen content is presented to the user, and an outcome (e.g., a click) is observed. Such algorithms must balance exploration (making potentially suboptimal decisions for the sake of acquiring information) and exploitation (using this information to make better decisions) [3]. Exploration could degrade the experience of a current user, but improves user experience in the long run. Concerns have been raised about whether exploration in such scenarios could be unfair to some population groups, in the sense that some groups may experience too much of the downside of exploration without sufficient upside [2]. We initialize a formal study of this issue, continuing an active line of work on unfairness and bias in machine learning [4, 5, 7, 8, 11]. Our work differs from the line of research on meritocratic fairness in online learning [9, 10, 14], which considers the allocation of limited resources such as bank loans and requires that nobody should be passed over in favor of a less qualified applicant. We study a fundamentally different scenario in which there are no allocation constraints and we would like to serve each user the best content possible.
[1]
John Langford,et al.
Making Contextual Decisions with Low Technical Debt
,
2016
.
[2]
Alexandra Chouldechova,et al.
Fair prediction with disparate impact: A study of bias in recidivism prediction instruments
,
2016,
Big Data.
[3]
Yang Liu,et al.
Calibrated Fairness in Bandits
,
2017,
ArXiv.
[4]
Aaron Roth,et al.
Meritocratic Fairness for Cross-Population Selection
,
2017,
ICML.
[5]
J. Langford,et al.
The Epoch-Greedy algorithm for contextual multi-armed bandits
,
2007,
NIPS 2007.
[6]
Sébastien Bubeck,et al.
Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems
,
2012,
Found. Trends Mach. Learn..
[7]
Toniann Pitassi,et al.
Fairness through awareness
,
2011,
ITCS '12.
[8]
Wei Chu,et al.
Contextual Bandits with Linear Payoff Functions
,
2011,
AISTATS.
[9]
Fernando Diaz,et al.
Exploring or Exploiting? Social and Ethical Implications of Autonomous Experimentation in AI
,
2016
.
[10]
Aaron Roth,et al.
Fairness in Learning: Classic and Contextual Bandits
,
2016,
NIPS.
[11]
Nathan Srebro,et al.
Equality of Opportunity in Supervised Learning
,
2016,
NIPS.
[12]
Wei Chu,et al.
A contextual-bandit approach to personalized news article recommendation
,
2010,
WWW '10.
[13]
Jon M. Kleinberg,et al.
Inherent Trade-Offs in the Fair Determination of Risk Scores
,
2016,
ITCS.