Reinforcement Learning of Simple Indirect Mechanisms

We introduce the use of reinforcement learning for indirect mechanisms, working with the existing class of {\em sequential price mechanisms}, which generalizes both serial dictatorship and posted price mechanisms and essentially characterizes all strongly obviously strategyproof mechanisms. Learning an optimal mechanism within this class forms a partially-observable Markov decision process. We provide rigorous conditions for when this class of mechanisms is more powerful than simpler static mechanisms, for sufficiency or insufficiency of observation statistics for learning, and for the necessity of complex (deep) policies. We show that our approach can learn optimal or near-optimal mechanisms in several experimental settings.

[1]  David R. M. Thompson,et al.  Revenue optimization in the generalized second-price auction , 2013, EC '13.

[2]  Paul Dütting,et al.  Payment Rules through Discriminant-Based Classifiers , 2012, ACM Trans. Economics and Comput..

[3]  Amy Greenwald,et al.  Empirical Mechanism Design: Designing Mechanisms from Data , 2019, UAI.

[4]  Paul Dütting,et al.  Optimal auctions through deep learning , 2017, ICML.

[5]  Robert D. Kleinberg,et al.  Learning on a budget: posted price mechanisms for online procurement , 2012, EC '12.

[6]  Pingzhong Tang,et al.  Reinforcement mechanism design , 2017, IJCAI.

[7]  Yiwei Zhang,et al.  Reinforcement Mechanism Design for e-commerce , 2017, WWW.

[8]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[9]  David C. Parkes,et al.  Automated Mechanism Design without Money via Machine Learning , 2016, IJCAI.

[10]  Shipra Agrawal,et al.  On Optimal Ordering in the Optimal Stopping Problem , 2019, EC.

[11]  Sébastien Lahaie,et al.  A Bayesian Clearing Mechanism for Combinatorial Auctions , 2017, AAAI.

[12]  Sven Seuken,et al.  Designing Core-selecting Payment Rules: A Computational Search Approach , 2018, EC.

[13]  Xi Chen,et al.  The Complexity of Optimal Multidimensional Pricing , 2013, SODA.

[14]  Tuomas Sandholm,et al.  Sequences of take-it-or-leave-it offers: near-optimal auctions without full valuation revelation , 2003, AAMAS '06.

[15]  Richard Cole,et al.  The sample complexity of revenue maximization , 2014, STOC.

[16]  Andrew Byde Applying evolutionary game theory to auction mechanism design , 2003, EC '03.

[17]  Avrim Blum,et al.  Preference Elicitation and Query Learning , 2004, J. Mach. Learn. Res..

[18]  Vincent Conitzer,et al.  Self-interested automated mechanism design and implications for optimal combinatorial auctions , 2004, EC '04.

[19]  Vincent Conitzer,et al.  Complexity of Mechanism Design , 2002, UAI.

[20]  David C. Parkes,et al.  Applying learning algorithms to preference elicitation , 2004, EC '04.

[21]  David C. Parkes,et al.  Deep Learning for Multi-Facility Location Mechanism Design , 2018, IJCAI.

[22]  Elman Mansimov,et al.  Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation , 2017, NIPS.

[23]  Bettina Klaus,et al.  Serial dictatorship mechanisms with reservation prices , 2017, Economic Theory.

[24]  Paul Dütting,et al.  Posted Prices, Smoothness, and Combinatorial Prophet Inequalities , 2016, ArXiv.

[25]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[26]  Elizabeth Sklar,et al.  Co-evolutionary Auction Mechanism Design: A Preliminary Report , 2002, AMEC.

[27]  Michal Feldman,et al.  Combinatorial Auctions via Posted Prices , 2014, SODA.

[28]  Peter L. Bartlett,et al.  Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..

[29]  S. Matthew Weinberg,et al.  Matroid prophet inequalities , 2012, STOC '12.

[30]  R. Bellman A Markovian Decision Process , 1957 .

[31]  Yang Cai,et al.  Understanding Incentives: Mechanism Design Becomes Algorithm Design , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.

[32]  S. Matthew Weinberg,et al.  The Sample Complexity of Up-to-ε Multi-Dimensional Revenue Maximization , 2018, 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS).

[33]  Paul Dütting,et al.  Prophet Inequalities Made Easy: Stochastic Optimization by Pricing Non-Stochastic Inputs , 2016, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[34]  Shengwu Li Obviously Strategy-Proof Mechanisms , 2017 .

[35]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[36]  Sven Seuken,et al.  Machine Learning-powered Iterative Combinatorial Auctions , 2019, ArXiv.

[37]  Peter Troyan,et al.  A Theory of Simplicity in Games and Mechanism Design , 2019, SSRN Electronic Journal.

[38]  Hervé Bourlard,et al.  Speech pattern discrimination and multilayer perceptrons , 1989 .

[39]  Atila Abdulkadiroglu,et al.  RANDOM SERIAL DICTATORSHIP AND THE CORE FROM RANDOM ENDOWMENTS IN HOUSE ALLOCATION PROBLEMS , 1998 .

[40]  Peter McBurney,et al.  Evolutionary mechanism design: a review , 2010, Autonomous Agents and Multi-Agent Systems.

[41]  Yang Cai,et al.  Optimal Multi-dimensional Mechanism Design: Reducing Revenue to Welfare Maximization , 2012, 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science.

[42]  Michael P. Wellman Methods for Empirical Game-Theoretic Analysis , 2006, AAAI.

[43]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[44]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[45]  Yan Hong,et al.  Reinforcement Mechanism Design, with Applications to Dynamic Pricing in Sponsored Search Auctions , 2017, ArXiv.