论文信息 - LADDER: A Human-Level Bidding Agent for Large-Scale Real-Time Online Auctions - 字舞流文

LADDER: A Human-Level Bidding Agent for Large-Scale Real-Time Online Auctions

We present LADDER, the first deep reinforcement learning agent that can successfully learn control policies for large-scale real-world problems directly from raw inputs composed of high-level semantic information. The agent is based on an asynchronous stochastic variant of DQN (Deep Q Network) named DASQN. The inputs of the agent are plain-text descriptions of states of a game of incomplete information, i.e. real-time large scale online auctions, and the rewards are auction profits of very large scale. We apply the agent to an essential portion of JD's online RTB (real-time bidding) advertising business and find that it easily beats the former state-of-the-art bidding policy that had been carefully engineered and calibrated by human experts: during JD.com's June 18th anniversary sale, the agent increased the company's ads revenue from the portion by more than 50%, while the advertisers' ROI (return on investment) also improved significantly.

Yu Wang | Yang He | Mantian Li | Yuxiang Liu | Jiayi Liu | Jinghe Hu | Weipeng P. Yan | Jun Hao | Yu Wang | Jinghe Hu | Weipeng P. Yan | Jiayi Liu | Yuxiang Liu | Jun Hao | Yang He | Mantian Li

[1] Nikhil R. Devanur,et al. Real-time bidding algorithms for performance-based display ad allocation , 2011, KDD.

[2] Regina Barzilay,et al. Language Understanding for Text-based Games using Deep Reinforcement Learning , 2015, EMNLP.

[3] Jun Wang,et al. Real-time bidding for online advertising: measurement and analysis , 2013, ADKDD '13.

[4] Xiang Zhang,et al. Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[5] Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.

[6] Robert Gibbons,et al. A primer in game theory , 1992 .

[7] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[8] Roger B. Myerson,et al. Optimal Auction Design , 1981, Math. Oper. Res..

[9] L. V. Allis,et al. Searching for solutions in games and artificial intelligence , 1994 .

[10] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[11] Heng-Tze Cheng,et al. Wide & Deep Learning for Recommender Systems , 2016, DLRS@RecSys.

[12] G. Monahan. State of the Art—A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms , 1982 .

[13] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[14] Peter Stone,et al. Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[15] Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.

[16] Sergey Ioffe,et al. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[17] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[18] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[19] Martin Wattenberg,et al. Ad click prediction: a view from the trenches , 2013, KDD.

[20] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[21] Steffen Rendle,et al. Factorization Machines with libFM , 2012, TIST.

[22] Ashish Agarwal,et al. Overlapping experiment infrastructure: more, better, faster experimentation , 2010, KDD.