Adaptive Incentive Design with Multi-Agent Meta-Gradient Reinforcement Learning

Critical sectors of human society are progressing toward the adoption of powerful artificial intelligence (AI) agents, which are trained individually on behalf of self-interested principals but deployed in a shared environment. Short of direct centralized regulation of AI, which is as difficult an issue as regulation of human actions, one must design institutional mechanisms that indirectly guide agents’ behaviors to safeguard and improve social welfare in the shared environment. Our paper focuses on one important class of such mechanisms: the problem of adaptive incentive design, whereby a central planner intervenes on the payoffs of an agent population via incentives in order to optimize a system objective. To tackle this problem in high-dimensional environments whose dynamics may be unknown or too complex to model, we propose a model-free meta-gradient method to learn an adaptive incentive function in the context of multi-agent reinforcement learning. Via the principle of online cross-validation, the incentive designer explicitly accounts for its impact on agents’ learning and, through them, the impact on future social welfare. Experiments on didactic benchmark problems show that the proposed method can induce selfish agents to learn near-optimal cooperative behavior and significantly outperform learning-oblivious baselines. When applied to a complex simulated economy, the proposed method finds tax policies that achieve better trade-off between economic productivity and equality than baselines, a result that we interpret via a detailed behavioral analysis.

[1]  Joel Z. Leibo,et al.  Inequity aversion improves cooperation in intertemporal social dilemmas , 2018, NeurIPS.

[2]  David C. Parkes,et al.  Reinforcement Learning of Simple Indirect Mechanisms , 2020, ArXiv.

[3]  Pingzhong Tang,et al.  Automated Mechanism Design via Neural Networks , 2018, AAMAS.

[4]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[5]  Sergio Valcarcel Macua,et al.  Coordinating the Crowd: Inducing Desirable Equilibria in Non-Cooperative Systems , 2019, AAMAS.

[6]  Satinder Singh,et al.  On Learning Intrinsic Rewards for Policy Gradient Methods , 2018, NeurIPS.

[7]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[8]  David Duvenaud,et al.  Optimizing Millions of Hyperparameters by Implicit Differentiation , 2019, AISTATS.

[9]  Stefan Lessmann,et al.  Bridging the divide in financial market forecasting: machine learners vs. financial economists , 2016, Expert Syst. Appl..

[10]  Lillian J. Ratliff,et al.  A Perspective on Incentive Design: Challenges and Opportunities , 2019, Annu. Rev. Control. Robotics Auton. Syst..

[11]  Craig Boutilier,et al.  Data center cooling using model-predictive control , 2018, NeurIPS.

[12]  Pingzhong Tang,et al.  Reinforcement mechanism design , 2017, IJCAI.

[13]  Peter Stone,et al.  Adaptive mechanism design: a metalearning approach , 2006, ICEC '06.

[14]  H. Stackelberg,et al.  Marktform und Gleichgewicht , 1935 .

[15]  Y. Niv Reinforcement learning in the brain , 2009 .

[16]  Afshin Oroojlooyjadid,et al.  A review of cooperative multi-agent deep reinforcement learning , 2019, Applied Intelligence.

[17]  Wojciech M. Czarnecki,et al.  Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[18]  Junhyuk Oh,et al.  Discovering Reinforcement Learning Algorithms , 2020, NeurIPS.

[19]  Shimon Whiteson,et al.  Stable Opponent Shaping in Differentiable Games , 2018, ICLR.

[20]  David C. Parkes,et al.  The AI Economist: Improving Equality and Productivity with AI-Driven Tax Policies , 2020, ArXiv.

[21]  Bart De Schutter,et al.  Reverse Stackelberg games, Part I: Basic framework , 2012, 2012 IEEE International Conference on Control Applications.

[22]  Craig Boutilier,et al.  RecSim: A Configurable Simulation Platform for Recommender Systems , 2019, ArXiv.

[23]  Jakub W. Pachocki,et al.  Dota 2 with Large Scale Deep Reinforcement Learning , 2019, ArXiv.

[24]  Samuel C. Woolley,et al.  Algorithms, bots, and political communication in the US 2016 election: The challenge of automated political communication for election law and administration , 2018 .

[25]  Patrizia Busato,et al.  Machine Learning in Agriculture: A Review , 2018, Sensors.

[26]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[27]  Tom Eccles,et al.  Should I tear down this wall? Optimizing social metrics by evaluating novel actions , 2020, COIN@AAMAS.

[28]  George J. Pappas,et al.  Taxi Dispatch With Real-Time Sensing Data in Metropolitan Areas: A Receding Horizon Control Approach , 2015, IEEE Transactions on Automation Science and Engineering.

[29]  Iyad Rahwan,et al.  The social dilemma of autonomous vehicles , 2015, Science.

[30]  Jürgen Kurths,et al.  Caring for the future can turn tragedy into comedy for long-term collective action under risk of collapse , 2020, Proceedings of the National Academy of Sciences.

[31]  André de Palma,et al.  Traffic congestion pricing methodologies and technologies , 2011 .

[32]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[33]  John Shawe-Taylor,et al.  Adaptive Mechanism Design: Learning to Promote Cooperation , 2018, 2020 International Joint Conference on Neural Networks (IJCNN).

[34]  Louis Kirsch,et al.  Improving Generalization in Meta Reinforcement Learning using Learned Objectives , 2020, ICLR.

[35]  Adam Wierman,et al.  Potential games are necessary to ensure pure nash equilibria in cost sharing games , 2013, EC '13.

[36]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[37]  Junhyuk Oh,et al.  Meta-Gradient Reinforcement Learning with an Objective Discovered Online , 2020, NeurIPS.

[38]  Jing Yu,et al.  End-to-End Learning and Intervention in Games , 2020, NeurIPS.

[39]  Daizhan Cheng,et al.  Potential Games Design Using Local Information , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[40]  D. Robinson,et al.  The topology of the 2x2 games : a new periodic table , 2005 .

[41]  Frans A. Oliehoek,et al.  A Concise Introduction to Decentralized POMDPs , 2016, SpringerBriefs in Intelligent Systems.

[42]  Sean Luke,et al.  Cooperative Multi-Agent Learning: The State of the Art , 2005, Autonomous Agents and Multi-Agent Systems.

[43]  Wei Wang,et al.  Recommender system application developments: A survey , 2015, Decis. Support Syst..

[44]  Jason R. Marden,et al.  Designing games for distributed optimization , 2011, IEEE Conference on Decision and Control and European Control Conference.

[45]  F. Nietzsche Beyond Good and Evil Prelude to a Philosophy of the Future , 1908, Nature.

[46]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[47]  Peter B. Luh,et al.  Information structure, Stackelberg games, and incentive controllability , 1981 .

[48]  Shimon Whiteson,et al.  Learning with Opponent-Learning Awareness , 2017, AAMAS.

[49]  David Bissell,et al.  Autonomous automobilities: The social impacts of driverless vehicles , 2020, Current Sociology.

[50]  Jimmy Ba,et al.  Learning Intrinsic Rewards as a Bi-Level Optimization Problem , 2020, UAI.

[51]  Boi Faltings,et al.  Achieving Diverse Objectives with AI-driven Prices in Deep Reinforcement Learning Multi-agent Markets , 2021, ArXiv.

[52]  David Silver,et al.  Meta-Gradient Reinforcement Learning , 2018, NeurIPS.

[53]  Herbert Gintis,et al.  Handbook of Computational Economics: Agent-Based Computational Economics (Handbook of Computational Economics S.) by K. L. Judd, L. Tesfatsion, M. D. Intriligator and Kenneth J. Arrow (eds.) , 2007, J. Artif. Soc. Soc. Simul..

[54]  Christos V. Verikoukis,et al.  A Survey on Demand Response Programs in Smart Grids: Pricing Methods and Optimization Algorithms , 2015, IEEE Communications Surveys & Tutorials.

[55]  Alexandre M. Bayen,et al.  Stabilizing Traffic with Autonomous Vehicles , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[56]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[57]  Lillian J. Ratliff,et al.  Adaptive Incentive Design , 2018, IEEE Transactions on Automatic Control.

[58]  Joel Z. Leibo,et al.  Open Problems in Cooperative AI , 2020, ArXiv.

[59]  Richard S. Sutton,et al.  Adapting Bias by Gradient Descent: An Incremental Version of Delta-Bar-Delta , 1992, AAAI.

[60]  Jason R. Marden,et al.  Utility Design for Distributed Resource Allocation—Part I: Characterizing and Optimizing the Exact Price of Anarchy , 2020, IEEE Transactions on Automatic Control.

[61]  Hongyuan Zha,et al.  Learning to Incentivize Other Learning Agents , 2020, NeurIPS.