Combinatorial Multi-Armed Bandits with Concave Rewards and Fairness Constraints

The problem of multi-armed bandit (MAB) with fairness constraint has emerged as an important research topic recently. For such problems, one common objective is to maximize the total rewards within a fixed round of pulls, while satisfying the fairness requirement of a minimum selection fraction for each individual arm in the long run. Previous works have made substantial advancements in designing efficient online selection solutions, however, they fail to achieve a sublinear regret bound when incorporating such fairness constraints. In this paper, we study a combinatorial MAB problem with concave objective and fairness constraints. In particular, we adopt a new approach that combines online convex optimization with bandit methods to design selection algorithms. Our algorithm is computationally efficient, and more importantly, manages to achieve a sublinear regret bound with probability guarantees. Finally, we evaluate the performance of our algorithm via extensive simulations and demonstrate that it outperforms the baselines substantially.

[1]  Peter S. Fader,et al.  Customer Acquisition via Display Advertising Using Multi-Armed Bandit Experiments , 2016, Mark. Sci..

[2]  Aravind Srinivasan,et al.  Online Resource Allocation with Matching Constraints , 2019, AAMAS.

[3]  Yang Liu,et al.  Efficient Online Resource Allocation in Heterogeneous Clusters with Machine Variability , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.

[4]  Wei Chen,et al.  Combinatorial Multi-Armed Bandit with General Reward Functions , 2016, NIPS.

[5]  Nikhil R. Devanur,et al.  Fast Algorithms for Online Stochastic Convex Programming , 2014, SODA.

[6]  Wei Chen,et al.  Combinatorial multi-armed bandit: general framework, results and applications , 2013, ICML 2013.

[7]  Aleksandrs Slivkins,et al.  Combinatorial Semi-Bandits with Knapsacks , 2017, AISTATS.

[8]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[9]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[10]  Mohamed Othman,et al.  Downlink scheduling for heterogeneous traffic with Gaussian weights in LTE-A , 2017, 2017 IEEE International Conference on Communications (ICC).

[11]  Mohamed Othman,et al.  Fair-QoS Broker Algorithm for Overload-State Downlink Resource Scheduling in LTE Networks , 2018, IEEE Systems Journal.

[12]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[13]  Slawomir Stanczak,et al.  Fundamental properties of solutions to utility maximization problems in wireless networks , 2016, ArXiv.

[14]  Aaron Roth,et al.  Fairness in Learning: Classic and Contextual Bandits , 2016, NIPS.

[15]  Jia Liu,et al.  Combinatorial Sleeping Bandits with Fairness Constraints , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.

[16]  Nikhil R. Devanur,et al.  Bandits with concave rewards and convex knapsacks , 2014, EC.

[17]  Liang Zheng,et al.  Optimal Algorithms in Wireless Utility Maximization: Proportional Fairness Decomposition and Nonlinear Perron-Frobenius Theory Framework , 2014, IEEE Transactions on Wireless Communications.

[18]  Mohamed Othman,et al.  Greedy–knapsack algorithm for optimal downlink resource allocation in LTE networks , 2015, Wireless Networks.

[19]  Y. Narahari,et al.  Achieving Fairness in the Stochastic Multi-armed Bandit Problem , 2019, AAAI.

[20]  David P. Williamson,et al.  The Design of Approximation Algorithms , 2011 .

[21]  Alexandre Proutière,et al.  Combinatorial Bandits Revisited , 2015, NIPS.

[22]  Guihai Chen,et al.  Radiation constrained wireless charger placement , 2016, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[23]  Nikhil R. Devanur,et al.  Linear Contextual Bandits with Knapsacks , 2015, NIPS.

[24]  Aleksandrs Slivkins,et al.  Bandits with Knapsacks , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.

[25]  Rong Jin,et al.  Trading regret for efficiency: online convex optimization with long term constraints , 2011, J. Mach. Learn. Res..

[26]  Rajiv Gandhi,et al.  Dependent rounding and its applications to approximation algorithms , 2006, JACM.

[27]  Christopher Jung,et al.  Online Learning with an Unknown Fairness Metric , 2018, NeurIPS.