Fair Contextual Multi-Armed Bandits: Theory and Experiments

When an AI system interacts with multiple users, it frequently needs to make allocation decisions. For instance, a virtual agent decides whom to pay attention to in a group setting, or a factory robot selects a worker to deliver a part. Demonstrating fairness in decision making is essential for such systems to be broadly accepted. We introduce a Multi-Armed Bandit algorithm with fairness constraints, where fairness is defined as a minimum rate that a task or a resource is assigned to a user. The proposed algorithm uses contextual information about the users and the task and makes no assumptions on how the losses capturing the performance of different users are generated. We provide theoretical guarantees of performance and empirical results from simulation and an online user study. The results highlight the benefit of accounting for contexts in fair decision making, especially when users perform better at some contexts and worse at others.

[1]  Aleksandrs Slivkins,et al.  Contextual Bandits with Similarity Information , 2009, COLT.

[2]  J. Langford,et al.  The Epoch-Greedy algorithm for contextual multi-armed bandits , 2007, NIPS 2007.

[3]  Yang Liu,et al.  Calibrated Fairness in Bandits , 2017, ArXiv.

[4]  Claudio Gentile,et al.  Algorithmic Chaining and the Role of Partial Feedback in Online Nonparametric Learning , 2017, COLT.

[5]  Nicholas Mattei,et al.  Group Fairness in Bandit Arm Selection , 2019, ArXiv.

[6]  Malte Jung,et al.  Reinforcement Learning with Fairness Constraints for Resource Distribution in Human-Robot Teams , 2019, ArXiv.

[7]  Seth Neel,et al.  Fair Algorithms for Infinite and Contextual Bandits , 2016, 1610.09559.

[8]  Solace Shen,et al.  Robot Assisted Tower Construction - A Resource Distribution Task to Study Human-Robot Collaboration and Interaction with Groups of People , 2018, ArXiv.

[9]  Quanquan Gu,et al.  Contextual Bandits in a Collaborative Environment , 2016, SIGIR.

[10]  Aaron Roth,et al.  Fairness in Learning: Classic and Contextual Bandits , 2016, NIPS.

[11]  Qingyun Wu,et al.  Learning Contextual Bandits in a Non-stationary Environment , 2018, SIGIR.

[12]  Michael H. Kutner Applied Linear Statistical Models , 1974 .

[13]  Liang Tang,et al.  Ensemble contextual bandits for personalized recommendation , 2014, RecSys '14.

[14]  Haipeng Luo,et al.  Improved Regret Bounds for Oracle-Based Adversarial Contextual Bandits , 2016, NIPS.

[15]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[16]  Haipeng Luo,et al.  The Fair Contextual Multi-Armed Bandit , 2020, AAMAS.

[17]  F. Maxwell Harper,et al.  The MovieLens Datasets: History and Context , 2016, TIIS.

[18]  Toniann Pitassi,et al.  Fairness through awareness , 2011, ITCS '12.

[19]  Jia Liu,et al.  Combinatorial Sleeping Bandits with Fairness Constraints , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.

[20]  Eli Upfal,et al.  Multi-Armed Bandits in Metric Spaces ∗ , 2008 .

[21]  Csaba Szepesvári,et al.  Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[22]  Wei Chu,et al.  Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.

[23]  Ambuj Tewari,et al.  Fighting Bandits with a New Kind of Smoothness , 2015, NIPS.

[24]  Malte Jung,et al.  Multi-Armed Bandits with Fairness Constraints for Distributing Resources to Human Teammates , 2020, 2020 15th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[25]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[26]  Y. Narahari,et al.  Achieving Fairness in the Stochastic Multi-armed Bandit Problem , 2019, AAAI.

[27]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[28]  Jia Yuan Yu,et al.  Lipschitz Bandits without the Lipschitz Constant , 2011, ALT.

[29]  Seth Neel,et al.  Rawlsian Fairness for Machine Learning , 2016, ArXiv.

[30]  Yuriy Brun,et al.  Offline Contextual Bandits with High Probability Fairness Guarantees , 2019, NeurIPS.

[31]  John Langford,et al.  Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits , 2014, ICML.

[32]  E. Lesaffre Superiority, equivalence, and non-inferiority trials. , 2008, Bulletin of the NYU hospital for joint diseases.

[33]  Francesca Rossi,et al.  Incorporating Behavioral Constraints in Online AI Systems , 2018, AAAI.