Bandits with Budgets

We investigate multi-armed bandits with budgets, a natural model for ad-display optimization encountered in search engines. We provide asymptotic regret lower bounds satisfied by any algorithm, and propose algorithms which match those lower bounds. We consider different types of budgets: scenarios where the advertiser has a fixed budget over a time horizon, and scenarios where the amount of money that is available to spend is incremented in each time slot. Further, we consider two different pricing models, one in which an advertiser is charged for each time her ad is shown (i.e., for each impression) and one in which the advertiser is charged only if a user clicks on the ad. For all of these cases, we show that it is possible to achieve O(log(T)) regret. For both the cost-per-impression and cost-per-click models, with a fixed budget, we provide regret lower bounds that apply to any uniformly good algorithm. Further, we show that B-KL-UCB, a natural variant of KL-UCB, is asymptotically optimal for these cases. Numerical experiments (based on a real-world data set) further suggest that B-KL-UCB also has the same or better finite-time performance when compared to various previously proposed (UCB-like) algorithms, which is important when applying such algorithms to a real-world problem.

[1]  T. Lai Adaptive treatment allocation and the multi-armed bandit problem , 1987 .

[2]  Archie C. Chapman,et al.  Epsilon-First Policies for Budget-Limited Multi-Armed Bandits , 2010, AAAI.

[3]  Josef Broder,et al.  Dynamic Pricing Under a General Parametric Choice Model , 2012, Oper. Res..

[4]  Tamás Linder,et al.  The On-Line Shortest Path Problem Under Partial Monitoring , 2007, J. Mach. Learn. Res..

[5]  Robert D. Kleinberg,et al.  Regret bounds for sleeping experts and bandits , 2010, Machine Learning.

[6]  Aleksandrs Slivkins,et al.  Dynamic Ad Allocation: Bandits with Budgets , 2013, ArXiv.

[7]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[8]  Aleksandrs Slivkins,et al.  Bandits with Knapsacks , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.

[9]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[10]  Shie Mannor,et al.  Unimodal Bandits , 2011, ICML.

[11]  Robert D. Kleinberg Nearly Tight Bounds for the Continuum-Armed Bandit Problem , 2004, NIPS.

[12]  Eric W. Cope,et al.  Regret and Convergence Bounds for a Class of Continuum-Armed Bandit Problems , 2009, IEEE Transactions on Automatic Control.

[13]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .

[14]  Rémi Munos,et al.  Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis , 2012, ALT.

[15]  Archie C. Chapman,et al.  Knapsack Based Optimal Policies for Budget-Limited Multi-Armed Bandits , 2012, AAAI.

[16]  Aurélien Garivier,et al.  Informational confidence bounds for self-normalized averages and applications , 2013, 2013 IEEE Information Theory Workshop (ITW).

[17]  Archie C. Chapman,et al.  ε-first policies for budget-limited multi-armed bandits , 2010, AAAI 2010.

[18]  Aurélien Garivier,et al.  The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond , 2011, COLT.

[19]  Vianney Perchet,et al.  Bounded regret in stochastic multi-armed bandits , 2013, COLT.

[20]  Filip Radlinski,et al.  Ranked bandits in metric spaces: learning diverse rankings over large document collections , 2013, J. Mach. Learn. Res..

[21]  Alexandre Proutière,et al.  Unimodal Bandits: Regret Lower Bounds and Optimal Algorithms , 2014, ICML.

[22]  Nicolò Cesa-Bianchi,et al.  Combinatorial Bandits , 2012, COLT.

[23]  T. Kailath The Divergence and Bhattacharyya Distance Measures in Signal Selection , 1967 .

[24]  Tamás Linder,et al.  The Shortest Path Problem Under Partial Monitoring , 2006, COLT.

[25]  Alexandre Proutière,et al.  Optimal Rate Sampling in 802.11 systems , 2013, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[26]  R. Srikant,et al.  Bandits with budgets , 2013, 52nd IEEE Conference on Decision and Control.

[27]  H. Wynn,et al.  Algebraic and Geometric Methods in Statistics: Introduction to non-parametric estimation , 2009 .

[28]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .