Reinforcement Mechanism Design, with Applications to Dynamic Pricing in Sponsored Search Auctions

In this study, we apply reinforcement learning techniques and propose what we call reinforcement mechanism design to tackle the dynamic pricing problem in sponsored search auctions. In contrast to previous game-theoretical approaches that heavily rely on rationality and common knowledge among the bidders, we take a data-driven approach, and try to learn, over repeated interactions, the set of optimal reserve prices. We implement our approach within the current sponsored search framework of a major search engine: we first train a buyer behavior model, via a real bidding data set, that accurately predicts bids given information that bidders are aware of, including the game parameters disclosed by the search engine, as well as the bidders' KPI data from previous rounds. We then put forward a reinforcement/MDP (Markov Decision Process) based algorithm that optimizes reserve prices over time, in a GSP-like auction. Our simulations demonstrate that our framework outperforms static optimization strategies including the ones that are currently in use, as well as several other dynamic ones.

[1]  Roger B. Myerson,et al.  Optimal Auction Design , 1981, Math. Oper. Res..

[2]  Enhong Chen,et al.  Agent Behavior Prediction and Its Generalization Analysis , 2014, AAAI.

[3]  Gavin Adrian Rummery Problem solving with reinforcement learning , 1995 .

[4]  Simon M. Lucas,et al.  A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[5]  Razvan Pascanu,et al.  Imagination-Augmented Agents for Deep Reinforcement Learning , 2017, NIPS.

[6]  Yiwei Zhang,et al.  Reinforcement Mechanism Design for e-commerce , 2017, WWW.

[7]  David M. Pennock,et al.  Revenue analysis of a family of ranking rules for keyword auctions , 2007, EC '07.

[8]  Pingzhong Tang,et al.  Automated Mechanism Design via Neural Networks , 2018, AAMAS.

[9]  E. Stacchetti,et al.  How (not) to sell nuclear weapons , 1996 .

[10]  Zoë Abrams,et al.  Revenue maximization when bidders have budgets , 2006, SODA '06.

[11]  Paul Dütting,et al.  Optimal auctions through deep learning , 2017, ICML.

[12]  Kevin Leyton-Brown,et al.  Level-0 meta-models for predicting human behavior in games , 2014, EC.

[13]  Renato Paes Leme,et al.  Where to Sell: Simulating Auctions From Learning Algorithms , 2016, EC.

[14]  Kevin Leyton-Brown,et al.  Predicting human behavior in unrepeated, simultaneous-move games , 2013, Games Econ. Behav..

[15]  David R. M. Thompson,et al.  Revenue optimization in the generalized second-price auction , 2013, EC '13.

[16]  Renato Paes Leme,et al.  Dynamic Auctions with Bank Accounts , 2016, IJCAI.

[17]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[18]  Benjamin Edelman,et al.  Strategic bidder behavior in sponsored search auctions , 2007, Decis. Support Syst..

[19]  Ming Liu,et al.  Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[20]  Di He,et al.  A Game-Theoretic Machine Learning Approach for Revenue Maximization in Sponsored Search , 2013, IJCAI.

[21]  Tie-Yan Liu,et al.  Predicting advertiser bidding behaviors in sponsored search by rationality modeling , 2013, WWW.

[22]  Peter B. Key,et al.  Stochastic variability in sponsored search auctions: observations and models , 2011, EC '11.

[23]  Michael Ostrovsky,et al.  Reserve Prices in Internet Advertising Auctions: A Field Experiment , 2009, Journal of Political Economy.

[24]  Mehryar Mohri,et al.  Non-parametric Revenue Optimization for Generalized Second Price auctions , 2015, UAI.

[25]  Renato Paes Leme,et al.  Optimal dynamic mechanisms with ex-post IR via bank accounts , 2016, ArXiv.

[26]  M. Battaglini Long-Term Contracting with Markovian Consumers , 2005 .

[27]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[28]  Deeparnab Chakrabarty,et al.  Budget constrained bidding in keyword auctions and online knapsack problems , 2008, WINE.

[29]  Mehryar Mohri,et al.  Learning Algorithms for Second-Price Auctions with Reserve , 2016, J. Mach. Learn. Res..

[30]  Dinan Gunawardena,et al.  Ranking and tradeoffs in sponsored search auctions , 2013, EC '13.

[31]  Paul Milgrom,et al.  Simplified mechanisms with an application to sponsored-search auctions , 2010, Games Econ. Behav..

[32]  Yonatan Gur,et al.  Learning in Repeated Auctions with Budgets: Regret Minimization and Equilibrium , 2017, EC.

[33]  Tim Roughgarden,et al.  Simple versus optimal mechanisms , 2009, SECO.

[34]  Marco Treiber Dynamic Programming (DP) , 2013 .

[35]  Mehryar Mohri,et al.  Revenue Optimization against Strategic Buyers , 2015, NIPS.

[36]  Mehryar Mohri,et al.  Optimal Regret Minimization in Posted-Price Auctions with Strategic Buyers , 2014, NIPS.

[37]  Éva Tardos,et al.  Econometrics for Learning Agents , 2015, EC.

[38]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[39]  Zihe Wang,et al.  Ex-post IR Dynamic Auctions with Cost-per-Action Payments , 2018, IJCAI.

[40]  Changrong Deng,et al.  Money for nothing: exploiting negative externalities , 2011, EC '11.

[41]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[42]  Pingzhong Tang,et al.  Practical versus Optimal Mechanisms , 2017, AAMAS.

[43]  Peter Stone,et al.  On the Analysis of Complex Backup Strategies in Monte Carlo Tree Search , 2016, ICML.