Diversity-Preserving K-Armed Bandits, Revisited

We consider the bandit-based framework for diversity-preserving recommendations introduced by Celis et al. (2019), who approached it mainly by a reduction to the setting of linear bandits. We design a UCB algorithm using the specific structure of the setting and show that it enjoys a bounded distribution-dependent regret in the natural cases when the optimal mixed actions put some probability mass on all actions (i.e., when diversity is desirable). Simulations illustrate this fact. We also provide regret lower bounds and briefly discuss distribution-free regret bounds.

[1]  Christopher Jung,et al.  Online Learning with an Unknown Fairness Metric , 2018, NeurIPS.

[2]  Tor Lattimore,et al.  Bounded Regret for Finite-Armed Structured Bandits , 2014, NIPS.

[3]  Nisheeth K. Vishnoi,et al.  Controlling Polarization in Personalization: An Algorithmic Framework , 2019, FAT.

[4]  T. L. Graves,et al.  Asymptotically Efficient Adaptive Choice of Control Laws inControlled Markov Chains , 1997 .

[5]  Vianney Perchet,et al.  Bandits with Side Observations: Bounded vs. Logarithmic Regret , 2018, UAI.

[6]  Jean-Yves Audibert,et al.  Regret Bounds and Minimax Policies under Partial Monitoring , 2010, J. Mach. Learn. Res..

[7]  Aaron Roth,et al.  Fairness in Learning: Classic and Contextual Bandits , 2016, NIPS.

[8]  Christos Thrampoulidis,et al.  Linear Stochastic Bandits Under Safety Constraints , 2019, NeurIPS.

[9]  Malte Jung,et al.  Reinforcement Learning with Fairness Constraints for Resource Distribution in Human-Robot Teams , 2019, ArXiv.

[10]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[11]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[12]  Aurélien Garivier,et al.  The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond , 2011, COLT.

[13]  Sheldon M. Ross,et al.  Stochastic Processes , 2018, Gauge Integral Structures for Stochastic Calculus and Quantum Electrodynamics.

[14]  Aurélien Garivier,et al.  Explore First, Exploit Next: The True Shape of Regret in Bandit Problems , 2016, Math. Oper. Res..

[15]  Tor Lattimore,et al.  The End of Optimism? An Asymptotic Analysis of Finite-Armed Linear Bandits , 2016, AISTATS.

[16]  Haipeng Luo,et al.  Fair Contextual Multi-Armed Bandits: Theory and Experiments , 2019, UAI.

[17]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[18]  Alessandro Lazaric,et al.  A Novel Confidence-Based Algorithm for Structured Bandits , 2020, AISTATS.

[19]  Jia Liu,et al.  Combinatorial Sleeping Bandits with Fairness Constraints , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.

[20]  Tor Lattimore,et al.  Adaptive Exploration in Linear Contextual Bandit , 2020, AISTATS.

[21]  Toniann Pitassi,et al.  Fairness through awareness , 2011, ITCS '12.

[22]  Y. Narahari,et al.  Achieving Fairness in the Stochastic Multi-armed Bandit Problem , 2019, AAAI.

[23]  Marc Lelarge,et al.  Leveraging Side Observations in Stochastic Bandits , 2012, UAI.

[24]  Wei Chu,et al.  Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.

[25]  Csaba Szepesvári,et al.  Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[26]  Yang Liu,et al.  Calibrated Fairness in Bandits , 2017, ArXiv.

[27]  Kwang-Sung Jun,et al.  Crush Optimism with Pessimism: Structured Bandits Beyond Asymptotic Optimality , 2020, NeurIPS.

[28]  Alexandre Proutière,et al.  Minimal Exploration in Structured Stochastic Bandits , 2017, NIPS.