论文信息 - A Multi-Arm Bandit Approach To Subset Selection Under Constraints - 字舞流文

A Multi-Arm Bandit Approach To Subset Selection Under Constraints

We explore the class of problems where a central planner needs to select a subset of agents, each with different quality and cost. The planner wants to maximize its utility while ensuring that the average quality of the selected agents is above a certain threshold. When the agents’ quality is known, we formulate our problem as an integer linear program (ILP) and propose a deterministic algorithm, namely DPSS that provides an exact solution to our ILP. We then consider the setting when the qualities of the agents are unknown. We model this as a Multi-Arm Bandit (MAB) problem and propose DPSS-UCB to learn the qualities over multiple rounds. We show that after a certain number of rounds, τ , DPSS-UCB outputs a subset of agents that satisfy the average quality constraint with a high probability. Next, we provide bounds on τ and prove that after τ rounds, the algorithm incurs a regret of O(lnT ), where T is the total number of rounds. We further illustrate the efficacy of DPSS-UCB through simulations. To overcome the computational limitations of DPSS, we propose a polynomial-time greedy algorithm, namely GSS, that provides an approximate solution to our ILP. We also compare the performance of DPSS and GSS through experiments.

Sujit Gujar | Ayush Deva | Kumar Abhishek | Sujit Gujar | A. Deva | Kumar Abhishek

[1] Nikhil R. Devanur,et al. Bandits with concave rewards and convex knapsacks , 2014, EC.

[2] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[3] John Langford,et al. Resourceful Contextual Bandits , 2014, COLT.

[4] Bhaskar Krishnamachari,et al. Combinatorial Network Optimization With Unknown Variables: Multi-Armed Bandits With Linear Rewards and Individual Observations , 2010, IEEE/ACM Transactions on Networking.

[5] Wei Chen,et al. Combinatorial Multi-Armed Bandit with General Reward Functions , 2016, NIPS.

[6] Aleksandrs Slivkins,et al. Bandits with Knapsacks , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.

[7] Chien-Ju Ho,et al. Adaptive Task Assignment for Crowdsourced Classification , 2013, ICML.

[8] Debmalya Mandal,et al. A Truthful Budget Feasible Multi-Armed Bandit Mechanism for Crowdsourcing Time Critical Tasks , 2015, AAMAS.

[9] Nicholas R. Jennings,et al. Efficient crowdsourcing of unknown experts using bounded multi-armed bandits , 2014, Artif. Intell..

[10] Yajun Wang,et al. Combinatorial Multi-Armed Bandit and Its Extension to Probabilistically Triggered Arms , 2014, J. Mach. Learn. Res..

[11] Bin Bi,et al. Iterative Learning for Reliable Crowdsourcing Systems , 2012 .

[12] Wei Chen,et al. Combinatorial Pure Exploration of Multi-Armed Bandits , 2014, NIPS.

[13] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[14] Wtt Wtt. Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits , 2015 .

[15] Y. Narahari,et al. Mechanisms with learning for stochastic multi-armed bandit problems , 2016 .

[16] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[17] Sujit Gujar,et al. A quality assuring, cost optimal multi-armed bandit mechanism for expertsourcing , 2018, Artif. Intell..

[18] Nicholas R. Jennings,et al. Efficient budget allocation with accuracy guarantees for crowdsourcing classification tasks , 2013, AAMAS.

[19] Shie Mannor,et al. PAC Bounds for Multi-armed Bandit and Markov Decision Processes , 2002, COLT.

[20] Alexandre Proutière,et al. Combinatorial Bandits Revisited , 2015, NIPS.

[21] Kontogeorgos Achilleas,et al. Marketing aspects of quality assurance systems , 2008 .

[22] T. Ibaraki,et al. THE MULTIPLE-CHOICE KNAPSACK PROBLEM , 1978 .

[23] David Haussler,et al. Probably Approximately Correct Learning , 2010, Encyclopedia of Machine Learning.

[24] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .

[25] G. J. Zaimai. Optimality conditions and duality for constrained measurable subset selection problems with minmax objective functions , 1989 .

[26] Aleksandrs Slivkins,et al. Introduction to Multi-Armed Bandits , 2019, Found. Trends Mach. Learn..

[27] Sujit Gujar,et al. Designing Truthful Contextual Multi-Armed Bandits based Sponsored Search Auctions , 2020, AAMAS.

[28] M. Terziovski,et al. The Business Value of Quality Management Systems Certification , 1997 .

[29] Yi Gai,et al. Learning Multiuser Channel Allocations in Cognitive Radio Networks: A Combinatorial Multi-Armed Bandit Formulation , 2010, 2010 IEEE Symposium on New Frontiers in Dynamic Spectrum (DySPAN).

[30] Robert P. Rooderkerk,et al. Robust Optimization of the 0-1 Knapsack Problem - Balancing Risk and Return in Assortment Optimization , 2015, Eur. J. Oper. Res..

[31] Wei Chen,et al. Combinatorial Multi-Armed Bandit: General Framework and Applications , 2013, ICML.

[32] J. Papastavrou,et al. The Dynamic and Stochastic Knapsack Problem with Deadlines , 1996 .