Real-Time Bidding with Side Information

We consider the problem of repeated bidding in online advertising auctions when some side information (e.g. browser cookies) is available ahead of submitting a bid in the form of a $d$-dimensional vector. The goal for the advertiser is to maximize the total utility (e.g. the total number of clicks) derived from displaying ads given that a limited budget $B$ is allocated for a given time horizon $T$. Optimizing the bids is modeled as a contextual Multi-Armed Bandit (MAB) problem with a knapsack constraint and a continuum of arms. We develop UCB-type algorithms that combine two streams of literature: the confidence-set approach to linear contextual MABs and the probabilistic bisection search method for stochastic root-finding. Under mild assumptions on the underlying unknown distribution, we establish distribution-independent regret bounds of order $\tilde{O}(d \cdot \sqrt{T})$ when either $B = \infty$ or when $B$ scales linearly with $T$.

[1]  Yonatan Gur,et al.  Learning in Repeated Auctions with Budgets: Regret Minimization and Equilibrium , 2019, Manag. Sci..

[2]  John Langford,et al.  Resourceful Contextual Bandits , 2014, COLT.

[3]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[4]  Csaba Szepesvári,et al.  Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[5]  Zizhuo Wang,et al.  Close the Gaps: A Learning-While-Doing Algorithm for Single-Product Revenue Management Problems , 2014, Oper. Res..

[6]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[7]  = = Near-Optimal Bisection Search for Nonparametric Dynamic Pricing with Inventory Constraint Yanzhe , 2014 .

[8]  Thomas P. Hayes,et al.  Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.

[9]  Sergei Vassilvitskii,et al.  WWW 2009 MADRID! Track: Internet Monetization / Session: Web Monetization Adaptive Bidding for Display Advertising ABSTRACT , 2022 .

[10]  Wei Chu,et al.  Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.

[11]  Near-Optimal Bisection Search for Nonparametric Dynamic Pricing with Inventory Constraint , 2014 .

[12]  George S. Lueker,et al.  Average-case analysis of off-line and on-line knapsack problems , 1995, SODA '95.

[13]  Moshe Babaioff,et al.  Dynamic Pricing with Limited Supply , 2011, ACM Trans. Economics and Comput..

[14]  S. Boucheron,et al.  Theory of classification : a survey of some recent advances , 2005 .

[15]  Vianney Perchet,et al.  Online learning in repeated auctions , 2015, COLT.

[16]  Nikhil R. Devanur,et al.  An efficient algorithm for contextual bandits with knapsacks, and an extension to concave objectives , 2015, COLT.

[17]  Anton Schwaighofer,et al.  Budget Optimization for Sponsored Search: Censored Learning in MDPs , 2012, UAI.

[18]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[19]  Umar Syed,et al.  Repeated Contextual Auctions with Strategic Buyers , 2014, NIPS.

[20]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[21]  Mohsen Bayati,et al.  Online Decision-Making with High-Dimensional Covariates , 2015 .

[22]  Nicholas R. Jennings,et al.  Efficient Regret Bounds for Online Bid Optimisation in Budget-Limited Sponsored Search Auctions , 2014, UAI.

[23]  Renato Paes Leme,et al.  Feature-based Dynamic Pricing , 2016, EC.

[24]  Nikhil R. Devanur,et al.  Linear Contextual Bandits with Knapsacks , 2015, NIPS.

[25]  Sujin Kim,et al.  The stochastic root-finding problem: Overview, solutions, and open questions , 2011, TOMC.

[26]  Aleksandrs Slivkins,et al.  Bandits with Knapsacks , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.