Deep Reinforcement Learning for Sponsored Search Real-time Bidding

Bidding optimization is one of the most critical problems in online advertising. Sponsored search (SS) auction, due to the randomness of user query behavior and platform nature, usually adopts keyword-level bidding strategies. In contrast, the display advertising (DA), as a relatively simpler scenario for auction, has taken advantage of real-time bidding (RTB) to boost the performance for advertisers. In this paper, we consider the RTB problem in sponsored search auction, named SS-RTB. SS-RTB has a much more complex dynamic environment, due to stochastic user query behavior and more complex bidding policies based on multiple keywords of an ad. Most previous methods for DA cannot be applied. We propose a reinforcement learning (RL) solution for handling the complex dynamic environment. Although some RL methods have been proposed for online advertising, they all fail to address the "environment changing'' problem: the state transition probabilities vary between two days. Motivated by the observation that auction sequences of two days share similar transition patterns at a proper aggregation level, we formulate a robust MDP model at hour-aggregation level of the auction data and propose a control-by-model framework for SS-RTB. Rather than generating bid prices directly, we decide a bidding model for impressions of each hour and perform real-time bidding accordingly. We also extend the method to handle the multi-agent problem. We deployed the SS-RTB system in the e-commerce search auction platform of Alibaba. Empirical experiments of offline evaluation and online A/B test demonstrate the effectiveness of our method.

[1]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[2]  Sergey Levine,et al.  Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.

[3]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[4]  Sergey Levine,et al.  Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics , 2014, NIPS.

[5]  Howard M. Schwartz,et al.  Multi-Agent Machine Learning: A Reinforcement Approach , 2014 .

[6]  Weinan Zhang,et al.  Optimal real-time bidding for display advertising , 2014, KDD.

[7]  Anton Schwaighofer,et al.  Budget Optimization for Sponsored Search: Censored Learning in MDPs , 2012, UAI.

[8]  Vahab S. Mirrokni,et al.  Bid optimization for broad match ad auctions , 2009, WWW '09.

[9]  Alexander J. Smola,et al.  Bid generation for advanced match in sponsored search , 2011, WSDM '11.

[10]  Martin A. Riedmiller,et al.  Reinforcement learning in feedback control , 2011, Machine Learning.

[11]  Ali Jalali,et al.  Real time bid optimization with smooth budget delivery in online advertising , 2013, ADKDD '13.

[12]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[13]  Jun Wang,et al.  Real-Time Bidding by Reinforcement Learning in Display Advertising , 2017, WSDM.

[14]  Yu Wang,et al.  LADDER: A Human-Level Bidding Agent for Large-Scale Real-Time Online Auctions , 2017, ArXiv.

[15]  Jon Feldman,et al.  Budget optimization in search-based advertising auctions , 2006, EC '07.

[16]  Nikhil R. Devanur,et al.  Real-time bidding algorithms for performance-based display ad allocation , 2011, KDD.

[17]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[18]  Ariel Fuxman,et al.  Using the wisdom of the crowds for keyword generation , 2008, WWW.

[19]  Nicole Immorlica,et al.  Dynamics of bid optimization in online advertisement auctions , 2007, WWW '07.

[20]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[21]  Wentong Li,et al.  Estimating conversion rate in display advertising from past erformance data , 2012, KDD.

[22]  Jun Wang,et al.  Real-Time Bidding: A New Frontier of Computational Advertising Research , 2015, WSDM.

[23]  Gagan Ghosh Multi-unit auctions with budget-constrained bidders , 2012 .

[24]  Foster J. Provost,et al.  Bid optimizing and inventory scoring in targeted online advertising , 2012, KDD.

[25]  Brendan Kitts,et al.  Optimal Bidding on Keyword Auctions , 2004, Electron. Mark..

[26]  Jun Wang,et al.  Real-time bidding for online advertising: measurement and analysis , 2013, ADKDD '13.

[27]  Ming-Syan Chen,et al.  Predicting Winning Price in Real Time Bidding with Censored Data , 2015, KDD.

[28]  Dorian Kodelja,et al.  Multiagent cooperation and competition with deep reinforcement learning , 2015, PloS one.

[29]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[30]  S. Muthukrishnan,et al.  Stochastic Models for Budget Optimization in Search-Based Advertising , 2006, Algorithmica.

[31]  L Poole David,et al.  Artificial Intelligence: Foundations of Computational Agents , 2010 .