A Multi-Arm Bandit Approach To Subset Selection Under Constraints

We explore the class of problems where a central planner needs to select a subset of agents, each with different quality and cost. The planner wants to maximize its utility while ensuring that the average quality of the selected agents is above a certain threshold. When the agents’ quality is known, we formulate our problem as an integer linear program (ILP) and propose a deterministic algorithm, namely DPSS that provides an exact solution to our ILP. We then consider the setting when the qualities of the agents are unknown. We model this as a Multi-Arm Bandit (MAB) problem and propose DPSS-UCB to learn the qualities over multiple rounds. We show that after a certain number of rounds, τ , DPSS-UCB outputs a subset of agents that satisfy the average quality constraint with a high probability. Next, we provide bounds on τ and prove that after τ rounds, the algorithm incurs a regret of O(lnT ), where T is the total number of rounds. We further illustrate the efficacy of DPSS-UCB through simulations. To overcome the computational limitations of DPSS, we propose a polynomial-time greedy algorithm, namely GSS, that provides an approximate solution to our ILP. We also compare the performance of DPSS and GSS through experiments.

[1]  Nikhil R. Devanur,et al.  Bandits with concave rewards and convex knapsacks , 2014, EC.

[2]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[3]  John Langford,et al.  Resourceful Contextual Bandits , 2014, COLT.

[4]  Bhaskar Krishnamachari,et al.  Combinatorial Network Optimization With Unknown Variables: Multi-Armed Bandits With Linear Rewards and Individual Observations , 2010, IEEE/ACM Transactions on Networking.

[5]  Wei Chen,et al.  Combinatorial Multi-Armed Bandit with General Reward Functions , 2016, NIPS.

[6]  Aleksandrs Slivkins,et al.  Bandits with Knapsacks , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.

[7]  Chien-Ju Ho,et al.  Adaptive Task Assignment for Crowdsourced Classification , 2013, ICML.

[8]  Debmalya Mandal,et al.  A Truthful Budget Feasible Multi-Armed Bandit Mechanism for Crowdsourcing Time Critical Tasks , 2015, AAMAS.

[9]  Nicholas R. Jennings,et al.  Efficient crowdsourcing of unknown experts using bounded multi-armed bandits , 2014, Artif. Intell..

[10]  Yajun Wang,et al.  Combinatorial Multi-Armed Bandit and Its Extension to Probabilistically Triggered Arms , 2014, J. Mach. Learn. Res..

[11]  Bin Bi,et al.  Iterative Learning for Reliable Crowdsourcing Systems , 2012 .

[12]  Wei Chen,et al.  Combinatorial Pure Exploration of Multi-Armed Bandits , 2014, NIPS.

[13]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[14]  Wtt Wtt Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits , 2015 .

[15]  Y. Narahari,et al.  Mechanisms with learning for stochastic multi-armed bandit problems , 2016 .

[16]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[17]  Sujit Gujar,et al.  A quality assuring, cost optimal multi-armed bandit mechanism for expertsourcing , 2018, Artif. Intell..

[18]  Nicholas R. Jennings,et al.  Efficient budget allocation with accuracy guarantees for crowdsourcing classification tasks , 2013, AAMAS.

[19]  Shie Mannor,et al.  PAC Bounds for Multi-armed Bandit and Markov Decision Processes , 2002, COLT.

[20]  Alexandre Proutière,et al.  Combinatorial Bandits Revisited , 2015, NIPS.

[21]  Kontogeorgos Achilleas,et al.  Marketing aspects of quality assurance systems , 2008 .

[22]  T. Ibaraki,et al.  THE MULTIPLE-CHOICE KNAPSACK PROBLEM , 1978 .

[23]  David Haussler,et al.  Probably Approximately Correct Learning , 2010, Encyclopedia of Machine Learning.

[24]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[25]  G. J. Zaimai Optimality conditions and duality for constrained measurable subset selection problems with minmax objective functions , 1989 .

[26]  Aleksandrs Slivkins,et al.  Introduction to Multi-Armed Bandits , 2019, Found. Trends Mach. Learn..

[27]  Sujit Gujar,et al.  Designing Truthful Contextual Multi-Armed Bandits based Sponsored Search Auctions , 2020, AAMAS.

[28]  M. Terziovski,et al.  The Business Value of Quality Management Systems Certification , 1997 .

[29]  Yi Gai,et al.  Learning Multiuser Channel Allocations in Cognitive Radio Networks: A Combinatorial Multi-Armed Bandit Formulation , 2010, 2010 IEEE Symposium on New Frontiers in Dynamic Spectrum (DySPAN).

[30]  Robert P. Rooderkerk,et al.  Robust Optimization of the 0-1 Knapsack Problem - Balancing Risk and Return in Assortment Optimization , 2015, Eur. J. Oper. Res..

[31]  Wei Chen,et al.  Combinatorial Multi-Armed Bandit: General Framework and Applications , 2013, ICML.

[32]  J. Papastavrou,et al.  The Dynamic and Stochastic Knapsack Problem with Deadlines , 1996 .