Coordinating the Crowd: Inducing Desirable Equilibria in Non-Cooperative Systems

Many real-world systems such as taxi systems, traffic networks and smart grids involve self-interested actors that perform individual tasks in a shared environment. However, in such systems, the self-interested behaviour of agents produces welfare inefficient and globally suboptimal outcomes that are detrimental to all - common examples are congestion in traffic networks, demand spikes for resources in electricity grids and over-extraction of environmental resources such as fisheries. We propose an incentive-design method which modifies agents' rewards in non-cooperative multi-agent systems that results in independent, self-interested agents choosing actions that produce optimal system outcomes in strategic settings. Our framework combines multi-agent reinforcement learning to simulate (real-world) agent behaviour and black-box optimisation to determine the optimal modifications to the agents' rewards or incentives given some fixed budget that results in optimal system performance. By modifying the reward functions and generating agents' equilibrium responses in a sequence of offline Markov games, our method enables optimal incentive structures to be determined offline through iterative updates of the reward functions of a simulated game. Our theoretical results show that our method converges to reward modifications that induce system optimality. We demonstrate the applications of our framework by tackling a challenging problem in economics that involves thousands of selfish agents and tackle a traffic congestion problem.

[1]  Christian Ibars,et al.  Distributed Demand Management in Smart Grid with a Congestion Game , 2010, 2010 First IEEE International Conference on Smart Grid Communications.

[2]  Haitham Bou-Ammar,et al.  Balancing Two-Player Stochastic Games with Soft Q-Learning , 2018, IJCAI.

[3]  S. Bhattacharyya,et al.  Leader-Follower semi-Markov Decision Problems: Theoretical Framework and Approximate Solution , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[4]  Nando de Freitas,et al.  Bayesian Optimization in a Billion Dimensions via Random Embeddings , 2013, J. Artif. Intell. Res..

[5]  M. Slade What Does An Oligopoly Maximize , 1994 .

[6]  Santiago Zazo,et al.  Dynamic Potential Games With Constraints: Fundamentals and Applications in Communications , 2016, IEEE Transactions on Signal Processing.

[7]  Wenzhong Li,et al.  Efficient Multi-User Computation Offloading for Mobile-Edge Cloud Computing , 2015, IEEE/ACM Transactions on Networking.

[8]  Michael L. Littman,et al.  Social reward shaping in the prisoner's dilemma , 2008, AAMAS.

[9]  Yoav Shoham,et al.  Multi-Agent Reinforcement Learning:a critical survey , 2003 .

[10]  Stephen Polasky,et al.  Dynamic environmental policy with strategic firms: prices versus quantities , 2003 .

[11]  Alexander Peysakhovich,et al.  Prosocial Learning Agents Solve Generalized Stag Hunts Better than Selfish Ones Extended Abstract , 2018 .

[12]  Enrique Munoz de Cote,et al.  Decentralised Learning in Systems with Many, Many Strategic Agents , 2018, AAAI.

[13]  Sam Devlin,et al.  Theoretical considerations of potential-based reward shaping for multi-agent systems , 2011, AAMAS.

[14]  André de Palma,et al.  Traffic congestion pricing methodologies and technologies , 2011 .

[15]  Noam Nisan,et al.  Algorithmic Mechanism Design , 2001, Games Econ. Behav..

[16]  M. Satterthwaite Strategy-proofness and Arrow's conditions: Existence and correspondence theorems for voting procedures and social welfare functions , 1975 .

[17]  George J. Pappas,et al.  Taxi Dispatch With Real-Time Sensing Data in Metropolitan Areas: A Receding Horizon Control Approach , 2016, IEEE Trans Autom. Sci. Eng..

[18]  David Silver,et al.  Fictitious Self-Play in Extensive-Form Games , 2015, ICML.

[19]  Santiago Zazo,et al.  Learning Parametric Closed-Loop Policies for Markov Potential Games , 2018, ICLR.

[20]  Nicolas Durrande Étude de classes de noyaux adaptées à la simplification et à l'interprétation des modèles d'approximation. Une approche fonctionnelle et probabiliste. , 2011 .

[21]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[22]  Ryan P. Adams,et al.  Gradient-based Hyperparameter Optimization through Reversible Learning , 2015, ICML.

[23]  Mingyan Liu,et al.  Spectrum Sharing as Spatial Congestion Games , 2010, ArXiv.

[24]  Michael T. Gastner,et al.  Price of anarchy in transportation networks: efficiency and optimality control. , 2007, Physical review letters.

[25]  Ariel Rubinstein,et al.  A Course in Game Theory , 1995 .

[26]  David S. Leslie,et al.  Generalised weakened fictitious play , 2006, Games Econ. Behav..

[27]  Patrice Marcotte,et al.  An overview of bilevel optimization , 2007, Ann. Oper. Res..

[28]  Pingzhong Tang,et al.  Reinforcement mechanism design , 2017, IJCAI.

[29]  Luca Moscardelli Convergence Issues in Congestion Games , 2013, Bull. EATCS.

[30]  John C. Harsanyi,et al.  Общая теория выбора равновесия в играх / A General Theory of Equilibrium Selection in Games , 1989 .

[31]  Santiago Zazo,et al.  Dynamic Potential Games in Communications: Fundamentals and Applications , 2015, ArXiv.

[32]  Pradeep Dubey,et al.  Inefficiency of Nash Equilibria , 1986, Math. Oper. Res..

[33]  Tim Roughgarden,et al.  Selfish routing and the price of anarchy , 2005 .