论文信息 - Real-Time Bidding with Side Information - 字舞流文

Real-Time Bidding with Side Information

We consider the problem of repeated bidding in online advertising auctions when some side information (e.g. browser cookies) is available ahead of submitting a bid in the form of a $d$-dimensional vector. The goal for the advertiser is to maximize the total utility (e.g. the total number of clicks) derived from displaying ads given that a limited budget $B$ is allocated for a given time horizon $T$. Optimizing the bids is modeled as a contextual Multi-Armed Bandit (MAB) problem with a knapsack constraint and a continuum of arms. We develop UCB-type algorithms that combine two streams of literature: the confidence-set approach to linear contextual MABs and the probabilistic bisection search method for stochastic root-finding. Under mild assumptions on the underlying unknown distribution, we establish distribution-independent regret bounds of order $\tilde{O}(d \cdot \sqrt{T})$ when either $B = \infty$ or when $B$ scales linearly with $T$.

Patrick Jaillet | Arthur Flajolet | Patrick Jaillet | Arthur Flajolet

[1] Yonatan Gur,et al. Learning in Repeated Auctions with Budgets: Regret Minimization and Equilibrium , 2019, Manag. Sci..

[2] John Langford,et al. Resourceful Contextual Bandits , 2014, COLT.

[3] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[4] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[5] Zizhuo Wang,et al. Close the Gaps: A Learning-While-Doing Algorithm for Single-Product Revenue Management Problems , 2014, Oper. Res..

[6] Peter L. Bartlett,et al. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[7] = = Near-Optimal Bisection Search for Nonparametric Dynamic Pricing with Inventory Constraint Yanzhe , 2014 .

[8] Thomas P. Hayes,et al. Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.

[9] Sergei Vassilvitskii,et al. WWW 2009 MADRID! Track: Internet Monetization / Session: Web Monetization Adaptive Bidding for Display Advertising ABSTRACT , 2022 .

[10] Wei Chu,et al. Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.

[11] Near-Optimal Bisection Search for Nonparametric Dynamic Pricing with Inventory Constraint , 2014 .

[12] George S. Lueker,et al. Average-case analysis of off-line and on-line knapsack problems , 1995, SODA '95.

[13] Moshe Babaioff,et al. Dynamic Pricing with Limited Supply , 2011, ACM Trans. Economics and Comput..

[14] S. Boucheron,et al. Theory of classification : a survey of some recent advances , 2005 .

[15] Vianney Perchet,et al. Online learning in repeated auctions , 2015, COLT.

[16] Nikhil R. Devanur,et al. An efficient algorithm for contextual bandits with knapsacks, and an extension to concave objectives , 2015, COLT.

[17] Anton Schwaighofer,et al. Budget Optimization for Sponsored Search: Censored Learning in MDPs , 2012, UAI.

[18] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[19] Umar Syed,et al. Repeated Contextual Auctions with Strategic Buyers , 2014, NIPS.

[20] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[21] Mohsen Bayati,et al. Online Decision-Making with High-Dimensional Covariates , 2015 .

[22] Nicholas R. Jennings,et al. Efficient Regret Bounds for Online Bid Optimisation in Budget-Limited Sponsored Search Auctions , 2014, UAI.

[23] Renato Paes Leme,et al. Feature-based Dynamic Pricing , 2016, EC.

[24] Nikhil R. Devanur,et al. Linear Contextual Bandits with Knapsacks , 2015, NIPS.

[25] Sujin Kim,et al. The stochastic root-finding problem: Overview, solutions, and open questions , 2011, TOMC.

[26] Aleksandrs Slivkins,et al. Bandits with Knapsacks , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.