Achieving Diverse Objectives with AI-driven Prices in Deep Reinforcement Learning Multi-agent Markets

We propose a practical approach to computing market prices and allocations via a deep reinforcement learning policymaker agent, operating in an environment of other learning agents. Compared to the idealized market equilibrium outcome – which we use as a benchmark – our policymaker is much more flexible, allowing us to tune the prices with regard to diverse objectives such as sustainability and resource wastefulness, fairness, buyers’ and sellers’ welfare, etc. To evaluate our approach, we design a realistic market with multiple and diverse buyers and sellers. Additionally, the sellers, which are deep learning agents themselves, compete for resources in a common-pool appropriation environment based on bio-economic models of commercial fisheries. We demonstrate that: (a) The introduced policymaker is able to achieve comparable performance to the market equilibrium, showcasing the potential of such approaches in markets where the equilibrium prices can not be efficiently computed. (b) Our policymaker can notably outperform the equilibrium solution on certain metrics, while at the same time maintaining comparable performance for the remaining ones. (c) As a highlight of our findings, our policymaker is significantly more successful in maintaining resource sustainability, compared to the market outcome, in scarce resource environments.

[1]  Rudolf Paul Wiegand,et al.  An analysis of cooperative coevolutionary algorithms , 2004 .

[2]  J. Larson,et al.  An Inquiry into the Nature and Causes of the Wealth of Nations , 2015 .

[3]  Yuval Rabani,et al.  Convergence of Incentive-Driven Dynamics in Fisher Markets , 2017, ACM-SIAM Symposium on Discrete Algorithms.

[4]  Joel Z. Leibo,et al.  Evolving intrinsic motivations for altruistic behavior , 2018, AAMAS.

[5]  Nikhil R. Devanur,et al.  An Improved Approximation Scheme for Computing Arrow-Debreu Prices for the Linear Case , 2003, FSTTCS.

[6]  Alexander Peysakhovich,et al.  Prosocial Learning Agents Solve Generalized Stag Hunts Better than Selfish Ones Extended Abstract , 2018 .

[7]  Nikhil R. Devanur,et al.  New Results on Rationality and Strongly Polynomial Time Solvability in Eisenberg-Gale Markets , 2006, WINE.

[8]  Nikhil R. Devanur,et al.  Distributed algorithms via gradient descent for fisher markets , 2011, EC '11.

[9]  Simina Brânzei,et al.  Walrasian Dynamics in Multi-unit Markets , 2017, AAAI.

[10]  M. Petersen,et al.  Information: Hard and Soft , 2018, The Review of Corporate Finance Studies.

[11]  Michael P. Wellman,et al.  Economic reasoning and artificial intelligence , 2015, Science.

[12]  Richard Cole,et al.  Dynamics of Distributed Updating in Fisher Markets , 2018, EC.

[13]  C. L. Mallows NON-NULL RANKING MODELS. I , 1957 .

[14]  C. Gini Variabilita e Mutabilita. , 1913 .

[15]  Ning Chen,et al.  Incentives for Strategic Behavior in Fisher Market Games , 2016, AAAI.

[16]  Lawrence G. Sager Handbook of Computational Social Choice , 2015 .

[17]  Dylan Hadfield-Menell,et al.  Silly rules improve the capacity of agents to learn stable enforcement and compliance behaviors , 2020, AAMAS.

[18]  L. Shapley,et al.  Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[19]  Éva Tardos,et al.  Econometrics for Learning Agents , 2015, EC.

[20]  E. Eisenberg,et al.  CONSENSUS OF SUBJECTIVE PROBABILITIES: THE PARI-MUTUEL METHOD, , 1959 .

[21]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[22]  Ruta Mehta,et al.  Nash Equilibria in Fisher Market , 2010, SAGT.

[23]  Paul Dütting,et al.  Optimal auctions through deep learning , 2017, ICML.

[24]  Vijay V. Vazirani,et al.  Eisenberg-Gale markets: Algorithms and game-theoretic properties , 2010, Games Econ. Behav..

[25]  David Abel Concepts in Bounded Rationality: Perspectives from Reinforcement Learning , 2019 .

[26]  Li Zhang,et al.  Proportional response dynamics in the Fisher market , 2009, Theor. Comput. Sci..

[27]  International Foundation for Autonomous Agents and MultiAgent Systems ( IFAAMAS ) , 2007 .

[28]  Florian K. Diekert,et al.  The Tragedy of the Commons from a Game-Theoretic Perspective , 2012 .

[29]  Guillaume J. Laurent,et al.  Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems , 2012, The Knowledge Engineering Review.

[30]  Pierre-Yves Oudeyer,et al.  A Hitchhiker's Guide to Statistical Comparisons of Reinforcement Learning Algorithms , 2019, RML@ICLR.

[31]  Yiwei Zhang,et al.  Reinforcement Mechanism Design for e-commerce , 2017, WWW.

[32]  Nando de Freitas,et al.  Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning , 2018, ICML.

[33]  Y. Niv Reinforcement learning in the brain , 2009 .

[34]  David C. Parkes,et al.  The AI Economist: Improving Equality and Productivity with AI-Driven Tax Policies , 2020, ArXiv.

[35]  B. Faltings,et al.  Improved Cooperation by Exploiting a Common Signal , 2021, AAMAS.

[36]  Doina Precup,et al.  Gifting in Multi-Agent Reinforcement Learning (Student Abstract) , 2020, AAAI.

[37]  Larry Rudolph,et al.  Implementation Matters in Deep RL: A Case Study on PPO and TRPO , 2020, ICLR.

[38]  P. Kollock SOCIAL DILEMMAS: The Anatomy of Cooperation , 1998 .

[39]  Joel Z. Leibo,et al.  A multi-agent reinforcement learning model of common-pool resource appropriation , 2017, NIPS.

[40]  Student,et al.  THE PROBABLE ERROR OF A MEAN , 1908 .

[41]  Raj Jain,et al.  A Quantitative Measure Of Fairness And Discrimination For Resource Allocation In Shared Computer Systems , 1998, ArXiv.

[42]  F. Bourguignon On the Measurement of Inequality , 2003 .

[43]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[44]  Simina Brânzei,et al.  The Fisher Market Game: Equilibrium and Welfare , 2014, AAAI.

[45]  J. Nash Equilibrium Points in N-Person Games. , 1950, Proceedings of the National Academy of Sciences of the United States of America.

[46]  H. Scarf,et al.  How to Compute Equilibrium Prices in 1891 , 2005 .

[47]  Philip Bachman,et al.  Deep Reinforcement Learning that Matters , 2017, AAAI.

[48]  Ion Stoica,et al.  Ray RLLib: A Composable and Scalable Reinforcement Learning Library , 2017, NIPS 2017.

[49]  Amin Saberi,et al.  Leontief economies encode nonzero sum two-player games , 2006, SODA '06.

[50]  Elliot Anshelevich,et al.  Distortion in Social Choice Problems: The First 15 Years and Beyond , 2021, IJCAI.

[51]  K. Arrow,et al.  EXISTENCE OF AN EQUILIBRIUM FOR A COMPETITIVE ECONOMY , 1954 .

[52]  Hongyuan Zha,et al.  Learning to Incentivize Other Learning Agents , 2020, NeurIPS.

[53]  Pingzhong Tang,et al.  Automated Mechanism Design via Neural Networks , 2018, AAMAS.

[54]  Colin W. Clark,et al.  The Worldwide Crisis in Fisheries: Economic Models And Human Behavior , 2006 .

[55]  Joel Z. Leibo,et al.  Inequity aversion improves cooperation in intertemporal social dilemmas , 2018, NeurIPS.

[56]  Matthew E. Taylor,et al.  A survey and critique of multiagent deep reinforcement learning , 2019, Autonomous Agents and Multi-Agent Systems.

[57]  Joel Z. Leibo,et al.  Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.

[58]  Alexander Peysakhovich,et al.  Consequentialist conditional cooperation in social dilemmas with imperfect information , 2017, AAAI Workshops.

[59]  A. Rubinstein Modeling Bounded Rationality , 1998 .

[60]  Martin Hoefer,et al.  Computing Equilibria in Markets with Budget-Additive Utilities , 2016, ESA.

[61]  John Shawe-Taylor,et al.  Adaptive Mechanism Design: Learning to Promote Cooperation , 2018, 2020 International Joint Conference on Neural Networks (IJCNN).