Group Fairness in Bandit Arm Selection

We propose a novel formulation of group fairness in the contextual multi-armed bandit (CMAB) setting. In the CMAB setting a sequential decision maker must at each time step choose an arm to pull from a finite set of arms after observing some context for each of the potential arm pulls. In our model arms are partitioned into two or more sensitive groups based on some protected feature (e.g., age, race, or socio-economic status). Despite the fact that there may be differences in expected payout between the groups, we may wish to ensure some form of fairness between picking arms from the various groups. In this work we explore two definitions of fairness: equal group probability, wherein the probability of pulling an arm from any of the protected groups is the same; and proportional parity, wherein the probability of choosing an arm from a particular group is proportional to the size of that group. We provide a novel algorithm that can accommodate these notions of fairness for an arbitrary number of groups, and provide bounds on the regret for our algorithm. We then validate our algorithm using synthetic data as well as two real-world datasets for intervention settings wherein we want to allocate resources fairly across protected groups.

[1]  Shipra Agrawal,et al.  Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.

[2]  Thorsten Joachims,et al.  Fairness of Exposure in Rankings , 2018, KDD.

[3]  Alexandra Chouldechova,et al.  The Frontiers of Fairness in Machine Learning , 2018, ArXiv.

[4]  Toniann Pitassi,et al.  Fairness through awareness , 2011, ITCS '12.

[5]  Tong Wang,et al.  Learning to Detect Patterns of Crime , 2013, ECML/PKDD.

[6]  Jeffrey S. Foster,et al.  Making the Cut: A Bandit-based Approach to Tiered Interviewing , 2019, NeurIPS.

[7]  Yang Liu,et al.  Calibrated Fairness in Bandits , 2017, ArXiv.

[8]  Rémi Munos,et al.  Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis , 2012, ALT.

[9]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[10]  Y. Narahari,et al.  Achieving Fairness in the Stochastic Multi-armed Bandit Problem , 2019, AAAI.

[11]  Philippe Preux,et al.  Bandits and Recommender Systems , 2015, MOD.

[12]  Seth Neel,et al.  Rawlsian Fairness for Machine Learning , 2016, ArXiv.

[13]  Yifan Wu,et al.  Conservative Bandits , 2016, ICML.

[14]  Howard Mark,et al.  Classical Least Squares, Part I: Mathematical Theory , 2010 .

[15]  H. Robbins,et al.  Asymptotically efficient adaptive allocation rules , 1985 .

[16]  Edith Elkind,et al.  Fairness Towards Groups of Agents in the Allocation of Indivisible Items , 2019, IJCAI.

[17]  Sharad Goel,et al.  The Measure and Mismeasure of Fairness: A Critical Review of Fair Machine Learning , 2018, ArXiv.

[18]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[19]  Jeffrey S. Foster,et al.  The Diverse Cohort Selection Problem , 2017, AAMAS.

[20]  Aaron Roth,et al.  Fairness in Learning: Classic and Contextual Bandits , 2016, NIPS.

[21]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[22]  R. Srikant,et al.  Algorithms with Logarithmic or Sublinear Regret for Constrained Contextual Bandits , 2015, NIPS.

[23]  Shipra Agrawal,et al.  Thompson Sampling for Contextual Bandits with Linear Payoffs , 2012, ICML.

[24]  Jon M. Kleinberg,et al.  Inherent Trade-Offs in the Fair Determination of Risk Scores , 2016, ITCS.

[25]  Seth Neel,et al.  Preventing Fairness Gerrymandering: Auditing and Learning for Subgroup Fairness , 2017, ICML.

[26]  Dan W. Brockt,et al.  The Theory of Justice , 2017 .

[27]  Malte Jung,et al.  Reinforcement Learning with Fairness Constraints for Resource Distribution in Human-Robot Teams , 2019, ArXiv.

[28]  Robin Burke,et al.  Multi-stakeholder Recommendation and its Connection to Multi-sided Fairness , 2019, RMSE@RecSys.

[29]  H. Peyton Young,et al.  Equity - in theory and practice , 1994 .

[30]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[31]  Yair Zick,et al.  Diversity Constraints in Public Housing Allocation , 2017, AAMAS.

[32]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[33]  Nathan Srebro,et al.  Equality of Opportunity in Supervised Learning , 2016, NIPS.

[34]  Carlos Eduardo Scheidegger,et al.  Certifying and Removing Disparate Impact , 2014, KDD.

[35]  Apurv Jain Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy , 2017, Business Economics.

[36]  Francesca Rossi,et al.  Incorporating Behavioral Constraints in Online AI Systems , 2018, AAAI.

[37]  Eric M. Schwartz,et al.  Dynamic Online Pricing with Incomplete Information Using Multi-Armed Bandit Experiments , 2018, Mark. Sci..

[38]  Latanya Sweeney,et al.  Discrimination in online ad delivery , 2013, CACM.