Control problems in online advertising and benefits of randomized bidding strategies

Abstract Online advertising is a US$600 billion industry where feedback control has come to play a critical role. The control problems are challenging and involve nonlinearities including discontinuities, high dimensionality, uncertainties, non-Gaussian noise, and more. In this paper systems engineering principles are applied to a core optimization problem within online advertising. First we demonstrate how the optimization problem may be decomposed into separate low-level estimation and high-level control modules. Then we derive a plant model from first principle to show how uncertainties and noise propagate through the plant. The plant model reveals challenges of the control problem and provides a framework to assess the impact on the plant behavior from different designs of the low-level estimation module. Thereafter, we describe a bid randomization technique that can be used in various ways to improve the performance and robustness of the system. The bid randomization technique is finally used to develop an algorithm for exploration and exploitation of an auction-based network, furnishing a solution to the above estimation subproblem.

[1]  T. Başar,et al.  Dynamic Noncooperative Game Theory , 1982 .

[2]  R. Devaney An Introduction to Chaotic Dynamical Systems , 1990 .

[3]  Rémi Munos,et al.  A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences , 2011, COLT.

[4]  G. Casella,et al.  Statistical Inference , 2003, Encyclopedia of Social Network Analysis and Mining.

[5]  Jianlong Zhang,et al.  Applications of feedback control in online advertising , 2013, 2013 American Control Conference.

[6]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[7]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[8]  Shipra Agrawal,et al.  Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.

[9]  Shie Mannor,et al.  Thompson Sampling for Complex Online Problems , 2013, ICML.

[10]  Sébastien Bubeck,et al.  Prior-free and prior-dependent regret bounds for Thompson Sampling , 2013, 2014 48th Annual Conference on Information Sciences and Systems (CISS).

[11]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[12]  H. Robbins,et al.  Asymptotically efficient adaptive allocation rules , 1985 .

[13]  Karl Johan Åström,et al.  Adaptive Control , 1989, Embedded Digital Control with Microcontrollers.

[14]  Niklas Karlsson,et al.  Adaptive control using Heisenberg bidding , 2014, 2014 American Control Conference.

[15]  J. Bather,et al.  Multi‐Armed Bandit Allocation Indices , 1990 .

[16]  Shipra Agrawal,et al.  Further Optimal Regret Bounds for Thompson Sampling , 2012, AISTATS.

[17]  Aurélien Garivier,et al.  The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond , 2011, COLT.

[18]  Aurélien Garivier,et al.  On Bayesian Upper Confidence Bounds for Bandit Problems , 2012, AISTATS.