Fairness of Exposure in Stochastic Bandits

Contextual bandit algorithms have become widely used for recommendation in online systems (e.g. marketplaces, music streaming, news), where they now wield substantial influence on which items get exposed to the users. This raises questions of fairness to the items — and to the sellers, artists, and writers that benefit from this exposure. We argue that the conventional bandit formulation can lead to an undesirable and unfair winner-takes-all allocation of exposure. To remedy this problem, we propose a new bandit objective that guarantees merit-based fairness of exposure to the items while optimizing utility to the users. We formulate fairness regret and reward regret in this setting, and present algorithms for both stochastic multi-armed bandits and stochastic linear bandits. We prove that the algorithms achieve sub-linear fairness regret and reward regret. Beyond the theoretical analysis, we also provide empirical evidence that these algorithms can fairly allocate exposure to different arms effectively.

[1]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[2]  Thorsten Joachims,et al.  Policy Learning for Fairness in Ranking , 2019, NeurIPS.

[3]  Haipeng Luo,et al.  Fair Contextual Multi-Armed Bandits: Theory and Experiments , 2019, UAI.

[4]  John Langford,et al.  A Reductions Approach to Fair Classification , 2018, ICML.

[5]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[6]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[7]  Benjamin Van Roy,et al.  Learning to Optimize via Posterior Sampling , 2013, Math. Oper. Res..

[8]  Group Fairness in Bandit Arm Selection , 2019, ArXiv.

[9]  Nisheeth K. Vishnoi,et al.  An Algorithmic Framework to Control Bias in Bandit-based Personalization , 2018, ArXiv.

[10]  D. Fitch,et al.  Review of "Algorithms of oppression: how search engines reinforce racism," by Noble, S. U. (2018). New York, New York: NYU Press. , 2018, CDQR.

[11]  Alexandra Chouldechova,et al.  Fair prediction with disparate impact: A study of bias in recidivism prediction instruments , 2016, Big Data.

[12]  Aaron Roth,et al.  Equal Opportunity in Online Classification with Partial Feedback , 2019, NeurIPS.

[13]  Christopher Jung,et al.  Online Learning with an Unknown Fairness Metric , 2018, NeurIPS.

[14]  John N. Tsitsiklis,et al.  Linearly Parameterized Bandits , 2008, Math. Oper. Res..

[15]  Aleksandrs Slivkins,et al.  Introduction to Multi-Armed Bandits , 2019, Found. Trends Mach. Learn..

[16]  Thorsten Joachims,et al.  Fairness of Exposure in Rankings , 2018, KDD.

[17]  Thomas P. Hayes,et al.  Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.

[18]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .

[19]  Csaba Szepesvari,et al.  Bandit Algorithms , 2020 .

[20]  Ed H. Chi,et al.  Fairness in Recommendation Ranking through Pairwise Comparisons , 2019, KDD.

[21]  J. Bretagnolle,et al.  Estimation des densités: risque minimax , 1978 .

[22]  Paul Horton,et al.  A Probabilistic Classification System for Predicting the Cellular Localization Sites of Proteins , 1996, ISMB.

[23]  Krishna P. Gummadi,et al.  Equity of Attention: Amortizing Individual Fairness in Rankings , 2018, SIGIR.

[24]  Andreas Krause,et al.  Preventing Disparate Treatment in Sequential Decision Making , 2018, IJCAI.

[25]  Marcel Worring,et al.  The challenge problem for automated detection of 101 semantic concepts in multimedia , 2006, MM '06.

[26]  Csaba Szepesvári,et al.  Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[27]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[28]  Malte Jung,et al.  Multi-Armed Bandits with Fairness Constraints for Distributing Resources to Human Teammates , 2020, 2020 15th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[29]  Yang Liu,et al.  Calibrated Fairness in Bandits , 2017, ArXiv.

[30]  Dietmar Jannach,et al.  Multistakeholder recommendation: Survey and research directions , 2020, User Modeling and User-Adapted Interaction.

[31]  Ariel D. Procaccia,et al.  Cake cutting: not just child's play , 2013, CACM.

[32]  Ufuk Topcu,et al.  Fairness with Dynamics , 2019, ArXiv.

[33]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[34]  Toniann Pitassi,et al.  Fairness through awareness , 2011, ITCS '12.

[35]  A. Blum,et al.  Advancing subgroup fairness via sleeping experts , 2019, ITCS.

[36]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[37]  Ronald E. Robertson,et al.  The search engine manipulation effect (SEME) and its possible impact on the outcomes of elections , 2015, Proceedings of the National Academy of Sciences.

[38]  Wei Chu,et al.  Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.

[39]  Aaron Roth,et al.  Fairness in Learning: Classic and Contextual Bandits , 2016, NIPS.

[40]  Seth Neel,et al.  Fair Algorithms for Infinite and Contextual Bandits , 2016, 1610.09559.

[41]  Nathan Srebro,et al.  Equality of Opportunity in Supervised Learning , 2016, NIPS.

[42]  Fernando Diaz,et al.  Towards a Fair Marketplace: Counterfactual Evaluation of the trade-off between Relevance, Fairness & Satisfaction in Recommendation Systems , 2018, CIKM.

[43]  Thorsten Joachims,et al.  User Fairness, Item Fairness, and Diversity for Rankings in Two-Sided Markets , 2020, ICTIR.

[44]  Wei Chu,et al.  Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms , 2010, WSDM '11.

[45]  Y. Narahari,et al.  Achieving Fairness in the Stochastic Multi-armed Bandit Problem , 2019, AAAI.

[46]  Avrim Blum,et al.  On preserving non-discrimination when combining expert advice , 2018, NeurIPS.

[47]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[48]  Jon M. Kleinberg,et al.  Inherent Trade-Offs in the Fair Determination of Risk Scores , 2016, ITCS.

[49]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[50]  Sahin Cem Geyik,et al.  Fairness-Aware Ranking in Search & Recommendation Systems with Application to LinkedIn Talent Search , 2019, KDD.

[51]  Seth Neel,et al.  Meritocratic Fairness for Infinite and Contextual Bandits , 2018, AIES.

[52]  Jia Liu,et al.  Combinatorial Sleeping Bandits with Fairness Constraints , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.