Safe Online Bid Optimization with Return-On-Investment and Budget Constraints subject to Uncertainty

In online marketing, the advertisers’ goal is usually a tradeoff between achieving high volumes and high profitability. The companies’ business units customarily address this tradeoff by maximizing the volumes while guaranteeing a lower bound to the Return On Investment (ROI). Technically speaking, such a task can be naturally modeled as a combinatorial optimization problem subject to ROI and budget constraints to be solved online since the parameter values are uncertain and need to be estimated during the sequential arrival of data. In this picture, the uncertainty over the constraints’ parameters plays a crucial role. Indeed, these constraints can be arbitrarily violated during the learning process due to an uncontrolled algorithms’ exploration, and such violations represent one of the major obstacles to the adoption of automatic techniques in real-world applications as often considered unacceptable risks by the advertisers. Thus, to make humans trust online learning tools, controlling the algorithms’ exploration so as to mitigate the risk and provide safety guarantees during the entire learning process is of paramount importance. In this paper, we study the nature of both the optimization and learning problems. In particular, when focusing on the optimization problem without uncertainty, we show that it is inapproximable within any factor unless P = NP, and we provide a pseudo-polynomial-time Corresponding author Email address: nicola.gatti@polimi.it (Nicola Gatti) Preprint submitted to Journal of LTEX Templates January 19, 2022 algorithm that achieves an optimal solution. When considering uncertainty, we prove that no online learning algorithm can violate the (ROI or budget) constraints during the learning process a sublinear number of times while guaranteeing a sublinear pseudo-regret. Thus, we provide an algorithm, namely GCB, guaranteeing sublinear regret at the cost of a potentially linear number of constraints violations. We also design its safe version, namely GCBsafe, guaranteeing w.h.p. a constant upper bound on the number of constraints violations at the cost of a linear pseudo-regret. More interestingly, inspired by the previous two algorithms, we provide an algorithm, namely GCBsafe(ψ, φ), guaranteeing both sublinear pseudo-regret and safety w.h.p. at the cost of accepting tolerances ψ and φ in the satisfaction of the ROI and budget constraints, respectively. This algorithm actually mitigates the risks due to the constraints violations without precluding the convergence to the optimal solution. Finally, we experimentally compare our algorithms in terms of pseudo-regret/constraint-violation tradeoff in settings generated from real-world data, showing the importance of adopting safety constraints in practice and the effectiveness of our algorithms.

[1]  Renato Paes Leme,et al.  Auction Design for ROI-Constrained Buyers , 2018, WWW.

[2]  Marcello Restelli,et al.  When Gaussian Processes Meet Combinatorial Bandits : GCB , 2018 .

[3]  John N. Tsitsiklis,et al.  Online Learning with Sample Path Constraints , 2009, J. Mach. Learn. Res..

[4]  Steffen Udluft,et al.  Safe exploration for reinforcement learning , 2008, ESANN.

[5]  Michèle Sebag,et al.  Exploration vs Exploitation vs Safety: Risk-Aware Multi-Armed Bandits , 2013, ACML.

[6]  Nikhil R. Devanur,et al.  The price of truthfulness for pay-per-click auctions , 2009, EC '09.

[7]  Marcello Restelli,et al.  Dealing with Interdependencies and Uncertainty in Multi-Channel Advertising Campaigns Optimization , 2019, WWW.

[8]  K. J. Ray Liu,et al.  Online Convex Optimization With Time-Varying Constraints and Bandit Feedback , 2019, IEEE Transactions on Automatic Control.

[9]  Marcello Restelli,et al.  A Combinatorial-Bandit Algorithm for the Online Joint Bid/Budget Optimization of Pay-per-Click Advertising Campaigns , 2018, AAAI.

[10]  Daniele Calandriello,et al.  Safe Policy Iteration , 2013, ICML.

[11]  Tao Qin,et al.  Multi-Armed Bandit with Budget Constraint and Variable Costs , 2013, AAAI.

[12]  Tie-Yan Liu,et al.  Joint optimization of bid and budget allocation in sponsored search , 2012, KDD.

[13]  S. Muthukrishnan,et al.  Stochastic Models for Budget Optimization in Search-Based Advertising , 2007, WINE.

[14]  Aleksandrs Slivkins,et al.  Bandits with Knapsacks , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.

[15]  Christos Thrampoulidis,et al.  Regret Bounds for Safe Gaussian Process Bandit Optimization , 2020, 2021 IEEE International Symposium on Information Theory (ISIT).

[16]  Alkis Gotovos,et al.  Safe Exploration for Optimization with Gaussian Processes , 2015, ICML.

[17]  Javier García,et al.  Safe Exploration of State and Action Spaces in Reinforcement Learning , 2012, J. Artif. Intell. Res..

[18]  Nicola Gatti,et al.  Online Joint Bid/Daily Budget Optimization of Internet Advertising Campaigns , 2020, Artif. Intell..

[19]  Marcello Restelli,et al.  Budgeted Multi-Armed Bandit in Continuous Action Space , 2016, ECAI.

[20]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[21]  Tim Roughgarden,et al.  Algorithmic Game Theory , 2007 .

[22]  Christos Thrampoulidis,et al.  Stage-wise Conservative Linear Bandits , 2020, NeurIPS.

[23]  Nicole Immorlica,et al.  Dynamics of bid optimization in online advertisement auctions , 2007, WWW '07.

[24]  Wei Chen,et al.  Combinatorial Multi-Armed Bandit: General Framework and Applications , 2013, ICML.

[25]  Juong-Sik Lee,et al.  Impact of ROI on Bidding and Revenue in Sponsored Search Advertisement Auctions , 2006 .

[26]  Aditya Gopalan,et al.  On Kernelized Multi-armed Bandits , 2017, ICML.

[27]  Jon Feldman,et al.  Budget optimization in search-based advertising auctions , 2006, EC '07.

[28]  Michalis Vazirgiannis,et al.  Toward an integrated framework for automated development and optimization of online advertising campaigns , 2014, Intell. Data Anal..

[29]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.