A Reward Shaping Approach for Reserve Price Optimization using Deep Reinforcement Learning

Real Time Bidding is the process of selling and buying online advertisements in real time auctions. Real time auctions are performed in header bidding partners or ad exchanges to sell publishers' ad placements. Ad exchanges run second price auctions and a reserve price should be set for each ad placement or impression. This reserve price is normally determined by the bids of header bidding partners. However, ad exchange may outbid higher reserve prices and optimizing this value largely affects the revenue. In this paper, we propose a deep reinforcement learning approach for adjusting the reserve price of individual impressions using contextual information. Normally, ad exchanges do not return any information about the auction except the sold-unsold status. This binary feedback is not suitable for maximizing the revenue because it contains no explicit information about the revenue. In order to enrich the reward function, we develop a novel reward shaping approach to provide informative reward signal for the reinforcement learning agent. Based on this approach, different intervals of reserve price get different weights and the reward value of each interval is learned through a search procedure. Using a simulator, we test our method on a set of impressions. Results show superior performance of our proposed method in terms of revenue compared with the baselines.

[1]  Shuai Yuan,et al.  A Dynamic Pricing Model for Unifying Programmatic Guarantee and Real-Time Bidding in Display Advertising , 2014, ADKDD'14.

[2]  Weinan Zhang,et al.  Real-Time Bidding with Multi-Agent Reinforcement Learning in Display Advertising , 2018, CIKM.

[3]  Yan Hong,et al.  Reinforcement Mechanism Design, with Applications to Dynamic Pricing in Sponsored Search Auctions , 2017, ArXiv.

[4]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[5]  Yingqian Zhang,et al.  A Reinforcement Learning Method to Select Ad Networks in Waterfall Strategy , 2019, BNAIC/BENELEARN.

[6]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[7]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[8]  David Vandyke,et al.  Reward Shaping with Recurrent Neural Networks for Speeding up On-Line Policy Learning in Spoken Dialogue Systems , 2015, SIGDIAL Conference.

[9]  Jun Wang,et al.  Display Advertising with Real-Time Bidding (RTB) and Behavioural Targeting , 2016, Found. Trends Inf. Retr..

[10]  Evangelos P. Markatos,et al.  No More Chasing Waterfalls: A Measurement Study of the Header Bidding Ad-Ecosystem , 2019, Internet Measurement Conference.

[11]  Fei-Yue Wang,et al.  Optimizing the revenue for ad exchanges in header bidding advertising markets , 2017, 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[12]  Aurélien Garivier,et al.  Optimization of a SSP's Header Bidding Strategy using Thompson Sampling , 2018, KDD.

[13]  Xiao Wang,et al.  The impact of reserve price on publisher revenue in real-time bidding advertising markets , 2017, 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[14]  Wei Zhao,et al.  Deep Reinforcement Learning for Sponsored Search Real-time Bidding , 2018, KDD.

[15]  Sam Devlin,et al.  Reward shaping for knowledge-based multi-objective multi-agent reinforcement learning , 2018, The Knowledge Engineering Review.

[16]  Uzay Kaymak,et al.  A Decision Support Method to Increase the Revenue of Ad Publishers in Waterfall Strategy , 2019, 2019 IEEE Conference on Computational Intelligence for Financial Engineering & Economics (CIFEr).

[17]  Chong Wang,et al.  Reserve Price Failure Rate Prediction with Header Bidding in Display Advertising , 2019, KDD.

[18]  Di Wu,et al.  A Multi-Agent Reinforcement Learning Method for Impression Allocation in Online Display Advertising , 2018, ArXiv.

[19]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[20]  Stylianos Despotakis,et al.  First-Price Auctions in Online Display Advertising , 2019, Journal of Marketing Research.

[21]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[22]  Uzay Kaymak,et al.  Reserve price optimization with header bidding and Ad Exchange , 2020, 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[23]  Juanjuan Li,et al.  The Reserve Price of Ad Impressions in Multi-Channel Real-Time Bidding Markets , 2018, IEEE Transactions on Computational Social Systems.

[24]  Uzay Kaymak,et al.  Maximizing revenue for publishers using header bidding and ad exchange auctions , 2021, Oper. Res. Lett..

[25]  Junwei Lu,et al.  MoTiAC: Multi-Objective Actor-Critics for Real-Time Bidding , 2020, ArXiv.

[26]  Jun Wang,et al.  An empirical study of reserve price optimisation in real-time bidding , 2014, KDD.

[27]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[28]  Kagan Tumer,et al.  Collective Intelligence for Control of Distributed Dynamical Systems , 1999, ArXiv.

[29]  Aleksandra Faust,et al.  Learning Navigation Behaviors End-to-End With AutoRL , 2018, IEEE Robotics and Automation Letters.

[30]  Feiyue Wang,et al.  A survey on real time bidding advertising , 2014, Proceedings of 2014 IEEE International Conference on Service Operations and Logistics, and Informatics.

[31]  Uzay Kaymak,et al.  Optimal display-ad allocation with guaranteed contracts and supply side platforms , 2019, Comput. Ind. Eng..