Achieving Fairness in the Stochastic Multi-armed Bandit Problem

We study an interesting variant of the stochastic multi-armed bandit problem, called the Fair-SMAB problem, where each arm is required to be pulled for at least a given fraction of the total available rounds. We investigate the interplay between learning and fairness in terms of a pre-specified vector denoting the fractions of guaranteed pulls. We define a fairness-aware regret, called r-Regret, that takes into account the above fairness constraints and naturally extends the conventional notion of regret. Our primary contribution is characterizing a class of Fair-SMAB algorithms by two parameters: the unfairness tolerance and learning algorithm used as a black-box. We provide a fairness guarantee for this class that holds uniformly over time irrespective of the choice of the learning algorithm. In particular, when the learning algorithm is UCB1, we show that our algorithm achieves O(log(T)) r-Regret. Finally, we evaluate the cost of fairness in terms of the conventional notion of regret.

[1]  Aaron Roth,et al.  Equal Opportunity in Online Classification with Partial Feedback , 2019, NeurIPS.

[2]  John Langford,et al.  A Reductions Approach to Fair Classification , 2018, ICML.

[3]  Ricardo Baeza-Yates,et al.  FA*IR: A Fair Top-k Ranking Algorithm , 2017, CIKM.

[4]  Nicole Immorlica,et al.  Adversarial Bandits with Knapsacks , 2018, 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS).

[5]  Seth Neel,et al.  A Convex Framework for Fair Regression , 2017, ArXiv.

[6]  Koby Crammer,et al.  Optimal Resource Allocation with Semi-Bandit Feedback , 2014, UAI.

[7]  Brian D. Ziebart,et al.  Fair Logistic Regression: An Adversarial Perspective , 2019, ArXiv.

[8]  Y. Narahari,et al.  Analysis of Thompson Sampling for Stochastic Sleeping Bandits , 2017, UAI.

[9]  Ed H. Chi,et al.  Fairness in Recommendation Ranking through Pairwise Comparisons , 2019, KDD.

[10]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[11]  Jon M. Kleinberg,et al.  On Fairness and Calibration , 2017, NIPS.

[12]  Harikrishna Narasimhan,et al.  Learning with Complex Loss Functions and Constraints , 2018, AISTATS.

[13]  Anton Schwaighofer,et al.  Budget Optimization for Sponsored Search: Censored Learning in MDPs , 2012, UAI.

[14]  Aaron Roth,et al.  Fairness in Learning: Classic and Contextual Bandits , 2016, NIPS.

[15]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[16]  Nenghai Yu,et al.  Thompson Sampling for Budgeted Multi-Armed Bandits , 2015, IJCAI.

[17]  Nisheeth K. Vishnoi,et al.  Ranking with Fairness Constraints , 2017, ICALP.

[18]  Krishna P. Gummadi,et al.  Fairness Constraints: Mechanisms for Fair Classification , 2015, AISTATS.

[19]  J. Rawls,et al.  John Rawls: A Theory of Justice , 1973 .

[20]  Thorsten Joachims,et al.  Policy Learning for Fairness in Ranking , 2019, NeurIPS.

[21]  Alexandra Chouldechova,et al.  Fair prediction with disparate impact: A study of bias in recidivism prediction instruments , 2016, Big Data.

[22]  R. Srikant,et al.  Algorithms with Logarithmic or Sublinear Regret for Constrained Contextual Bandits , 2015, NIPS.

[23]  Krishna P. Gummadi,et al.  Fairness Behind a Veil of Ignorance: A Welfare Analysis for Automated Decision Making , 2018, NeurIPS.

[24]  Reuben Binns,et al.  What Can Political Philosophy Teach Us about Algorithmic Fairness? , 2018, IEEE Security & Privacy.

[25]  Christopher Jung,et al.  Online Learning with an Unknown Fairness Metric , 2018, NeurIPS.

[26]  Krishna P. Gummadi,et al.  Fairness Beyond Disparate Treatment & Disparate Impact: Learning Classification without Disparate Mistreatment , 2016, WWW.

[27]  Jia Liu,et al.  Combinatorial Sleeping Bandits with Fairness Constraints , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.

[28]  Avrim Blum,et al.  Advancing subgroup fairness via sleeping experts , 2020, ITCS.

[29]  Dan W. Brockt,et al.  The Theory of Justice , 2017 .

[30]  Aleksandrs Slivkins,et al.  Introduction to Multi-Armed Bandits , 2019, Found. Trends Mach. Learn..

[31]  Nathan Srebro,et al.  Equality of Opportunity in Supervised Learning , 2016, NIPS.

[32]  Alexandre Proutière,et al.  Learning Proportionally Fair Allocations with Low Regret , 2018, SIGMETRICS.

[33]  Koby Crammer,et al.  Linear Multi-Resource Allocation with Semi-Bandit Feedback , 2015, NIPS.

[34]  Toniann Pitassi,et al.  Fairness through awareness , 2011, ITCS '12.

[35]  Yang Liu,et al.  Calibrated Fairness in Bandits , 2017, ArXiv.

[36]  Nisheeth K. Vishnoi,et al.  An Algorithmic Framework to Control Bias in Bandit-based Personalization , 2018, ArXiv.

[37]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[38]  Thorsten Joachims,et al.  Fairness of Exposure in Rankings , 2018, KDD.

[39]  Y. Narahari,et al.  Mechanisms with learning for stochastic multi-armed bandit problems , 2016 .

[40]  Robert D. Kleinberg,et al.  Regret bounds for sleeping experts and bandits , 2010, Machine Learning.

[41]  Avi Feller,et al.  Algorithmic Decision Making and the Cost of Fairness , 2017, KDD.

[42]  Nicholas R. Jennings,et al.  Efficient Regret Bounds for Online Bid Optimisation in Budget-Limited Sponsored Search Auctions , 2014, UAI.

[43]  Shipra Agrawal,et al.  Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.