End-to-End Learning and Intervention in Games

In a social system, the self-interest of agents can be detrimental to the collective good, sometimes leading to social dilemmas. To resolve such a conflict, a central designer may intervene by either redesigning the system or incentivizing the agents to change their behaviors. To be effective, the designer must anticipate how the agents react to the intervention, which is dictated by their often unknown payoff functions. Therefore, learning about the agents is a prerequisite for intervention. In this paper, we provide a unified framework for learning and intervention in games. We cast the equilibria of games as individual layers and integrate them into an end-to-end optimization framework. To enable the backward propagation through the equilibria of games, we propose two approaches, respectively based on explicit and implicit differentiation. Specifically, we cast the equilibria as the solutions to variational inequalities (VIs). The explicit approach unrolls the projection method for solving VIs, while the implicit approach exploits the sensitivity of the solutions to VIs. At the core of both approaches is the differentiation through a projection operator. Moreover, we establish the correctness of both approaches and identify the conditions under which one approach is more desirable than the other. The analytical results are validated using several real-world problems.

[1]  A. C. Pigou Economics of welfare , 1920 .

[2]  J. G. Wardrop,et al.  Some Theoretical Aspects of Road Traffic Research , 1952 .

[3]  T. Koopmans,et al.  Studies in the Economics of Transportation. , 1956 .

[4]  T. Koopmans,et al.  Studies in the Economics of Transportation. , 1956 .

[5]  G. Stampacchia,et al.  On some non-linear elliptic differential-functional equations , 1966 .

[6]  G. Hardin,et al.  The Tragedy of the Commons , 1968, Green Planet Blues.

[7]  Dietrich Braess,et al.  Über ein Paradoxon aus der Verkehrsplanung , 1968, Unternehmensforschung.

[8]  W. Vickrey Congestion Theory and Transport Investment , 1969 .

[9]  G. Stampacchia,et al.  Convex programming and variational inequalities , 1972 .

[10]  R. Rosenthal A class of games possessing pure-strategy Nash equilibria , 1973 .

[11]  R. Lewontin ‘The Selfish Gene’ , 1977, Nature.

[12]  Stella Dafermos,et al.  Traffic Equilibrium and Variational Inequalities , 1980 .

[13]  D. Kinderlehrer,et al.  An introduction to variational inequalities and their applications , 1980 .

[14]  D. Bertsekas,et al.  Projection methods for variational inequalities with application to the traffic assignment problem , 1982 .

[15]  Stella Dafermos,et al.  An iterative scheme for variational inequalities , 1983, Math. Program..

[16]  Thomas L. Magnanti,et al.  Network Design and Transportation Planning: Models and Algorithms , 1984, Transp. Sci..

[17]  Moshe Ben-Akiva,et al.  Discrete Choice Analysis: Theory and Application to Travel Demand , 1985 .

[18]  D. McFadden The Choice Theory Approach to Market Research , 1986 .

[19]  Mark D. Uncles,et al.  Discrete Choice Analysis: Theory and Application to Travel Demand , 1987 .

[20]  Terry L. Friesz,et al.  Sensitivity Analysis for Equilibrium Network Flow , 1988, Transp. Sci..

[21]  A. Nagurney Migration equilibrium and variational inequalities. , 1989, Economics letters.

[22]  Terry L. Friesz,et al.  Sensitivity analysis based heuristic algorithms for mathematical programs with variational inequality constraints , 1990, Math. Program..

[23]  Stella Dafermos,et al.  General economic equilibrium and variational inequalities , 1991, Oper. Res. Lett..

[24]  Anna Nagurney,et al.  A network equilibrium formulation of market disequilibrium and variational inequalities , 1991, Networks.

[25]  A. Nagurney Network Economics: A Variational Inequality Approach , 1992 .

[26]  Thomas L. Magnanti,et al.  Sensitivity Analysis for Variational Inequalities , 1992, Math. Oper. Res..

[27]  Michael Mesterton-Gibbons,et al.  An introduction to game-theoretic modelling , 2019 .

[28]  Masao Fukushima,et al.  A globally convergent Newton method for solving strongly monotone variational inequalities , 1993, Math. Program..

[29]  P. Marcotte,et al.  On the convergence of projection methods: Application to the decomposition of affine variational inequalities , 1995 .

[30]  R. Jayakrishnan,et al.  A FASTER PATH-BASED ALGORITHM FOR TRAFFIC ASSIGNMENT , 1994 .

[31]  Torbjörn Larsson,et al.  A class of gap functions for variational inequalities , 1994, Math. Program..

[32]  L. Shapley,et al.  Potential Games , 1994 .

[33]  K. Basu Stackelberg equilibrium in oligopoly: An explanation based on managerial incentives , 1995 .

[34]  Bethany L. Nicholson,et al.  Mathematical Programs with Equilibrium Constraints , 2021, Pyomo — Optimization Modeling in Python.

[35]  Avi Pfeffer,et al.  Representations and Solutions for Game-Theoretic Problems , 1997, Artif. Intell..

[36]  Hai Yang,et al.  Models and algorithms for road network design: a review and some new developments , 1998 .

[37]  Christos H. Papadimitriou,et al.  Worst-case Equilibria , 1999, STACS.

[38]  A. Rivlin,et al.  Economic Choices , 2001 .

[39]  Michael Patriksson,et al.  A Mathematical Model and Descent Algorithm for Bilevel Traffic Management , 2002, Transp. Sci..

[40]  Michael P. Wellman,et al.  Learning payoff functions in infinite games , 2005, Machine Learning.

[41]  Sarit Kraus,et al.  Deployed ARMOR protection: the application of a game theoretic model for security at the Los Angeles International Airport , 2008, AAMAS.

[42]  Vincent Conitzer,et al.  Learning and Approximating the Optimal Strategy to Commit To , 2009, SAGT.

[43]  Heinrich von Stackelberg Market Structure and Equilibrium , 2010 .

[44]  Francisco Facchinei,et al.  Convex Optimization, Game Theory, and Variational Inequality Theory , 2010, IEEE Signal Processing Magazine.

[45]  Mingyan Liu,et al.  Price of Anarchy for Congestion Games in Cognitive Radio Networks , 2012, IEEE Transactions on Wireless Communications.

[46]  Justin Domke,et al.  Generic Methods for Optimization-Based Modeling , 2012, AISTATS.

[47]  Ariel D. Procaccia,et al.  Learning Optimal Commitment to Overcome Insecurity , 2014, NIPS.

[48]  Rong Yang,et al.  Adaptive resource allocation for wildlife protection against illegal poachers , 2014, AAMAS.

[49]  J. Zico Kolter,et al.  OptNet: Differentiable Optimization as a Layer in Neural Networks , 2017, ICML.

[50]  Rachna,et al.  Sapiens: A brief history of humankind , 2017 .

[51]  Lei Xu,et al.  Input Convex Neural Networks : Supplementary Material , 2017 .

[52]  S. Shankar Sastry,et al.  Markov Decision Process Routing Games , 2017, 2017 ACM/IEEE 8th International Conference on Cyber-Physical Systems (ICCPS).

[53]  Joel Z. Leibo,et al.  Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.

[54]  J. Zico Kolter,et al.  What game are we playing? End-to-end learning in normal and extensive form games , 2018, IJCAI.

[55]  Fei Fang Integrating Learning with Game Theory for Societal Challenges , 2019, IJCAI.

[56]  Francesca Parise,et al.  A variational inequality framework for network games: Existence, uniqueness, convergence and sensitivity analysis , 2017, Games Econ. Behav..

[57]  Stephen P. Boyd,et al.  Differentiating through a cone program , 2019, Journal of Applied and Numerical Optimization.

[58]  Kim D. Listmann,et al.  Deep Lagrangian Networks for end-to-end learning of energy-based control for under-actuated systems , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[59]  Sergio Valcarcel Macua,et al.  Coordinating the Crowd: Inducing Desirable Equilibria in Non-Cooperative Systems , 2019, AAMAS.

[60]  Stephen P. Boyd,et al.  Differentiable Convex Optimization Layers , 2019, NeurIPS.

[61]  Zhengyuan Zhou,et al.  Learning in games with continuous action sets and unknown payoff functions , 2019, Math. Program..

[62]  Stephen P. Boyd,et al.  Differentiating Through a Conic Program , 2019 .

[63]  Nando de Freitas,et al.  Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning , 2018, ICML.

[64]  David C. Parkes,et al.  The AI Economist: Improving Equality and Productivity with AI-Driven Tax Policies , 2020, ArXiv.

[65]  Zhuoran Yang,et al.  Pontryagin Differentiable Programming: An End-to-End Learning and Control Framework , 2019, NeurIPS.

[66]  Roberto Cominetti,et al.  When Is Selfish Routing Bad? The Price of Anarchy in Light and Heavy Traffic , 2017, Oper. Res..

[67]  Fei Fang,et al.  Artificial Intelligence for Social Good: A Survey , 2020, ArXiv.