Advances in Bandits with Knapsacks

"Bandits with Knapsacks" (\BwK) is a general model for multi-armed bandits under supply/budget constraints. While worst-case regret bounds for \BwK are well-understood, we focus on logarithmic instance-dependent regret bounds. We largely resolve them for one limited resource other than time, and for known, deterministic resource consumption. We also bound regret within a given round ("simple regret"). One crucial technique analyzes the sum of the confidence terms of the chosen arms. This technique allows to import the insights from prior work on bandits without resources, which leads to several extensions.

[1]  Nikhil R. Devanur,et al.  Linear Contextual Bandits with Knapsacks , 2015, NIPS.

[2]  András György,et al.  Continuous Time Associative Bandit Problems , 2007, IJCAI.

[3]  Nikhil R. Devanur,et al.  Bandits with concave rewards and convex knapsacks , 2014, EC.

[4]  John Langford,et al.  Resourceful Contextual Bandits , 2014, COLT.

[5]  Tao Qin,et al.  Multi-Armed Bandit with Budget Constraint and Variable Costs , 2013, AAAI.

[6]  Patrick Jaillet,et al.  Logarithmic regret bounds for Bandits with Knapsacks , 2015, 1510.01800.

[7]  Aleksandrs Slivkins,et al.  Combinatorial Semi-Bandits with Knapsacks , 2017, AISTATS.

[8]  Archie C. Chapman,et al.  ε-first policies for budget-limited multi-armed bandits , 2010, AAAI 2010.

[9]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[10]  Massimo Franceschetti,et al.  Unifying the stochastic and the adversarial Bandits with Knapsack , 2018, IJCAI.

[11]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[12]  Nicole Immorlica,et al.  Adversarial Bandits with Knapsacks , 2018, 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS).

[13]  Eli Upfal,et al.  Multi-Armed Bandits in Metric Spaces ∗ , 2008 .

[14]  Csaba Szepesvári,et al.  Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[15]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .

[16]  Arkadi Nemirovski,et al.  Lectures on modern convex optimization - analysis, algorithms, and engineering applications , 2001, MPS-SIAM series on optimization.

[17]  Archie C. Chapman,et al.  Knapsack Based Optimal Policies for Budget-Limited Multi-Armed Bandits , 2012, AAAI.

[18]  Moshe Babaioff,et al.  Dynamic Pricing with Limited Supply , 2011, ACM Trans. Economics and Comput..

[19]  Robert D. Kleinberg,et al.  Learning on a budget: posted price mechanisms for online procurement , 2012, EC '12.

[20]  Siddhartha Banerjee,et al.  Online Allocation and Pricing: Constant Regret via Bellman Inequalities , 2019, Oper. Res..

[21]  Patrick Jaillet,et al.  Low regret bounds for Bandits with Knapsacks , 2015, ArXiv.

[22]  Eli Upfal,et al.  Bandits and Experts in Metric Spaces , 2013, J. ACM.

[23]  Aleksandrs Slivkins,et al.  Introduction to Multi-Armed Bandits , 2019, Found. Trends Mach. Learn..

[24]  Aleksandrs Slivkins,et al.  Bandits with Knapsacks , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.

[25]  Andreas Krause,et al.  Truthful incentives in crowdsourcing tasks using regret minimization mechanisms , 2013, WWW.

[26]  Nikhil R. Devanur,et al.  An efficient algorithm for contextual bandits with knapsacks, and an extension to concave objectives , 2015, COLT.

[27]  Omar Besbes,et al.  Dynamic Pricing Without Knowing the Demand Function: Risk Bounds and Near-Optimal Algorithms , 2009, Oper. Res..

[28]  Russ Bubley,et al.  Randomized algorithms , 1995, CSUR.

[29]  Archie C. Chapman,et al.  ǫ – First Policies for Budget – Limited Multi-Armed Bandits Long , 2010 .

[30]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[31]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[32]  Branislav Kveton,et al.  Efficient Learning in Large-Scale Combinatorial Semi-Bandits , 2014, ICML.

[33]  R. Srikant,et al.  Bandits with Budgets , 2015, SIGMETRICS.