LADDER: A Human-Level Bidding Agent for Large-Scale Real-Time Online Auctions

We present LADDER, the first deep reinforcement learning agent that can successfully learn control policies for large-scale real-world problems directly from raw inputs composed of high-level semantic information. The agent is based on an asynchronous stochastic variant of DQN (Deep Q Network) named DASQN. The inputs of the agent are plain-text descriptions of states of a game of incomplete information, i.e. real-time large scale online auctions, and the rewards are auction profits of very large scale. We apply the agent to an essential portion of JD's online RTB (real-time bidding) advertising business and find that it easily beats the former state-of-the-art bidding policy that had been carefully engineered and calibrated by human experts: during JD.com's June 18th anniversary sale, the agent increased the company's ads revenue from the portion by more than 50%, while the advertisers' ROI (return on investment) also improved significantly.

[1]  Nikhil R. Devanur,et al.  Real-time bidding algorithms for performance-based display ad allocation , 2011, KDD.

[2]  Regina Barzilay,et al.  Language Understanding for Text-based Games using Deep Reinforcement Learning , 2015, EMNLP.

[3]  Jun Wang,et al.  Real-time bidding for online advertising: measurement and analysis , 2013, ADKDD '13.

[4]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[5]  Sergey Levine,et al.  Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.

[6]  Robert Gibbons,et al.  A primer in game theory , 1992 .

[7]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[8]  Roger B. Myerson,et al.  Optimal Auction Design , 1981, Math. Oper. Res..

[9]  L. V. Allis,et al.  Searching for solutions in games and artificial intelligence , 1994 .

[10]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[11]  Heng-Tze Cheng,et al.  Wide & Deep Learning for Recommender Systems , 2016, DLRS@RecSys.

[12]  G. Monahan State of the Art—A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms , 1982 .

[13]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[14]  Peter Stone,et al.  Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[15]  Michael I. Jordan,et al.  Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.

[16]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[17]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[18]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[19]  Martin Wattenberg,et al.  Ad click prediction: a view from the trenches , 2013, KDD.

[20]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[21]  Steffen Rendle,et al.  Factorization Machines with libFM , 2012, TIST.

[22]  Ashish Agarwal,et al.  Overlapping experiment infrastructure: more, better, faster experimentation , 2010, KDD.