Reinforcement Learning of Sequential Price Mechanisms

We introduce the use of reinforcement learning for indirect mechanisms, working with the existing class of sequential price mechanisms, which generalizes both serial dictatorship and posted price mechanisms and essentially characterizes all strongly obviously strategyproof mechanisms. Learning an optimal mechanism within this class forms a partiallyobservable Markov decision process. We provide rigorous conditions for when this class of mechanisms is more powerful than simpler static mechanisms, for sufficiency or insufficiency of observation statistics for learning, and for the necessity of complex (deep) policies. We show that our approach can learn optimal or near-optimal mechanisms in several experimental settings.

[1]  J. Sethuraman,et al.  On Optimal Ordering in the Optimal Stopping Problem , 2019, EC.

[2]  Paul Dütting,et al.  Posted Prices, Smoothness, and Combinatorial Prophet Inequalities , 2016, ArXiv.

[3]  David C. Parkes,et al.  Deep Learning for Multi-Facility Location Mechanism Design , 2018, IJCAI.

[4]  Sven Seuken,et al.  Fast Iterative Combinatorial Auctions via Bayesian Learning , 2018, AAAI.

[5]  Yang Cai,et al.  Optimal Multi-dimensional Mechanism Design: Reducing Revenue to Welfare Maximization , 2012, 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science.

[6]  Atila Abdulkadiroglu,et al.  RANDOM SERIAL DICTATORSHIP AND THE CORE FROM RANDOM ENDOWMENTS IN HOUSE ALLOCATION PROBLEMS , 1998 .

[7]  Paul Dütting,et al.  Payment Rules through Discriminant-Based Classifiers , 2012, ACM Trans. Economics and Comput..

[8]  Elizabeth Sklar,et al.  Co-evolutionary Auction Mechanism Design: A Preliminary Report , 2002, AMEC.

[9]  David C. Parkes,et al.  Applying learning algorithms to preference elicitation , 2004, EC '04.

[10]  David C. Parkes,et al.  The AI Economist: Improving Equality and Productivity with AI-Driven Tax Policies , 2020, ArXiv.

[11]  Michael P. Wellman Methods for Empirical Game-Theoretic Analysis , 2006, AAAI.

[12]  Avrim Blum,et al.  Preference Elicitation and Query Learning , 2004, J. Mach. Learn. Res..

[13]  Tuomas Sandholm,et al.  Sequences of take-it-or-leave-it offers: near-optimal auctions without full valuation revelation , 2003, AAMAS '06.

[14]  Robert D. Kleinberg,et al.  Learning on a budget: posted price mechanisms for online procurement , 2012, EC '12.

[15]  Vincent Conitzer,et al.  Complexity of Mechanism Design , 2002, UAI.

[16]  David R. M. Thompson,et al.  Revenue optimization in the generalized second-price auction , 2013, EC '13.

[17]  S. Matthew Weinberg,et al.  The Sample Complexity of Up-to-ε Multi-Dimensional Revenue Maximization , 2018, 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS).

[18]  R. Bellman A Markovian Decision Process , 1957 .

[19]  Amy Greenwald,et al.  Empirical Mechanism Design: Designing Mechanisms from Data , 2019, UAI.

[20]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[21]  Xi Chen,et al.  The Complexity of Optimal Multidimensional Pricing , 2013, SODA.

[22]  Andrew Byde Applying evolutionary game theory to auction mechanism design , 2003, EC '03.

[23]  Sven Seuken,et al.  Machine Learning-powered Iterative Combinatorial Auctions , 2019, ArXiv.

[24]  David C. Parkes,et al.  Reinforcement Learning of Simple Indirect Mechanisms , 2020, ArXiv.

[25]  Pingzhong Tang,et al.  Reinforcement mechanism design , 2017, IJCAI.

[26]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[27]  Hervé Bourlard,et al.  Speech pattern discrimination and multilayer perceptrons , 1989 .

[28]  Peter L. Bartlett,et al.  Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..

[29]  Yang Cai,et al.  An algorithmic characterization of multi-dimensional mechanisms , 2011, STOC '12.

[30]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[31]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[32]  David C. Parkes,et al.  Automated Mechanism Design without Money via Machine Learning , 2016, IJCAI.

[33]  Yiwei Zhang,et al.  Reinforcement Mechanism Design for e-commerce , 2017, WWW.

[34]  Elman Mansimov,et al.  Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation , 2017, NIPS.

[35]  Richard Cole,et al.  The sample complexity of revenue maximization , 2014, STOC.

[36]  Peter McBurney,et al.  Autonomous Agents and Multi-agent Systems Manuscript No. Evolutionary Mechanism Design: a Review , 2022 .

[37]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[38]  Yang Cai,et al.  Understanding Incentives: Mechanism Design Becomes Algorithm Design , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.

[39]  Michal Feldman,et al.  Combinatorial Auctions via Posted Prices , 2014, SODA.

[40]  Sven Seuken,et al.  Designing Core-selecting Payment Rules: A Computational Search Approach , 2018, EC.

[41]  Shengwu Li Obviously Strategy-Proof Mechanisms , 2017 .

[42]  Yan Hong,et al.  Reinforcement Mechanism Design, with Applications to Dynamic Pricing in Sponsored Search Auctions , 2017, ArXiv.

[43]  S. Matthew Weinberg,et al.  Matroid prophet inequalities , 2012, STOC '12.

[44]  Peter Troyan,et al.  A Theory of Simplicity in Games and Mechanism Design , 2019, SSRN Electronic Journal.

[45]  Bettina Klaus,et al.  Serial dictatorship mechanisms with reservation prices , 2017, Economic Theory.

[46]  Paul Dütting,et al.  Optimal auctions through deep learning , 2017, ICML.

[47]  Vincent Conitzer,et al.  Self-interested automated mechanism design and implications for optimal combinatorial auctions , 2004, EC '04.