Reinforcement Learning with Sequential Information Clustering in Real-Time Bidding

Display advertising is a billion dollar business which is the primary income of many companies. In this scenario, real-time bidding optimization is one of the most important problems, where the bids of ads for each impression are determined by an intelligent policy such that some global key performance indicators are optimized. Due to the highly dynamic bidding environment, many recent works try to use reinforcement learning algorithms to train the bidding agents. However, as the probability of the occurrence of a particular state is typically low and the state representation in current work lacks sequential information, the convergence speed and performance of deep reinforcement algorithms are disappointing. To tackle these two challenges in the real-time bidding scenario, we propose ClusterA3C, a novel Advantage Asynchronous Actor-Critic (A3C) variant integrated with a sequential information extraction scheme and a clustering based state aggregation scheme. We conduct extensive experiments to validate the proposed scheme on a real-world commercial dataset. Experimental results show that the proposed scheme outperforms the state of the art methods in terms of either performance or convergence speed.

[1]  Jun Wang,et al.  Real-Time Bidding by Reinforcement Learning in Display Advertising , 2017, WSDM.

[2]  Weinan Zhang,et al.  Real-Time Bidding with Multi-Agent Reinforcement Learning in Display Advertising , 2018, CIKM.

[3]  Yu Wang,et al.  LADDER: A Human-Level Bidding Agent for Large-Scale Real-Time Online Auctions , 2017, ArXiv.

[4]  Alan Ritter,et al.  Adversarial Learning for Neural Dialogue Generation , 2017, EMNLP.

[5]  Jun Wang,et al.  Real-time bidding for online advertising: measurement and analysis , 2013, ADKDD '13.

[6]  Radu State,et al.  Improving Real-Time Bidding Using a Constrained Markov Decision Process , 2017, ADMA.

[7]  Konstantinos Blekas,et al.  A Model Based Reinforcement Learning Approach Using On-Line Clustering , 2012, 2012 IEEE 24th International Conference on Tools with Artificial Intelligence.

[8]  Michael I. Jordan,et al.  Reinforcement Learning with Soft State Aggregation , 1994, NIPS.

[9]  Chih-Jen Lin,et al.  Field-aware Factorization Machines for CTR Prediction , 2016, RecSys.

[10]  Shie Mannor,et al.  Reinforcement Learning in Robust Markov Decision Processes , 2013, Math. Oper. Res..

[11]  Alessandro Lazaric,et al.  Analysis of a Classification-based Policy Iteration Algorithm , 2010, ICML.

[12]  Wei Zhao,et al.  Deep Reinforcement Learning for Sponsored Search Real-time Bidding , 2018, KDD.

[13]  Alexander J. Smola,et al.  Bid generation for advanced match in sponsored search , 2011, WSDM '11.

[14]  Marcin Andrychowicz,et al.  Asymmetric Actor Critic for Image-Based Robot Learning , 2017, Robotics: Science and Systems.

[15]  Foster J. Provost,et al.  Bid optimizing and inventory scoring in targeted online advertising , 2012, KDD.

[16]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[17]  Weinan Zhang,et al.  Optimal real-time bidding for display advertising , 2014, KDD.

[18]  Nicole Immorlica,et al.  Dynamics of bid optimization in online advertisement auctions , 2007, WWW '07.

[19]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[20]  Zoran Popovic,et al.  Efficient Bayesian Clustering for Reinforcement Learning , 2016, IJCAI.

[21]  Hongtao Lu,et al.  Deep CTR Prediction in Display Advertising , 2016, ACM Multimedia.

[22]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[23]  Jian Xu,et al.  Smart Pacing for Effective Online Ad Campaign Optimization , 2015, KDD.

[24]  S. Bennett,et al.  Development of the PID controller , 1993, IEEE Control Systems.

[25]  Yuya Yoshikawa,et al.  A Nonparametric Delayed Feedback Model for Conversion Rate Prediction , 2018, ArXiv.

[26]  Ngo Anh Vien,et al.  A Deep Hierarchical Reinforcement Learning Algorithm in Partially Observable Markov Decision Processes , 2018, IEEE Access.

[27]  Shie Mannor,et al.  Dynamic abstraction in reinforcement learning via clustering , 2004, ICML.

[28]  Romain Laroche,et al.  Hybrid Reward Architecture for Reinforcement Learning , 2017, NIPS.

[29]  Jun Wang,et al.  Feedback Control of Real-Time Display Advertising , 2016, WSDM.