Coordinating Followers to Reach Better Equilibria: End-to-End Gradient Descent for Stackelberg Games

A growing body of work in game theory extends the traditional Stackelberg game to settings with one leader and multiple followers who play a Nash equilibrium. Standard approaches for computing equilibria in these games reformulate the followers’ best response as constraints in the leader’s optimization problem. These reformulation approaches can sometimes be effective, but make limiting assumptions on the followers’ objectives and the equilibrium reached by followers, e.g., uniqueness, optimism, or pessimism. To overcome these limitations, we run gradient descent to update the leader’s strategy by differentiating through the equilibrium reached by followers. Our approach generalizes to any stochastic equilibrium selection procedure that chooses from multiple equilibria, where we compute the stochastic gradient by back-propagating through a sampled Nash equilibrium using the solution to a partial differential equation to establish the unbiasedness of the stochastic gradient. Using the unbiased gradient estimate, we implement the gradient-based approach to solve three Stackelberg problems with multiple followers. Our approach consistently outperforms existing baselines to achieve higher utility for the leader.

[1]  Herminia I. Calvete,et al.  Linear bilevel multi-follower programming with independent followers , 2007, J. Glob. Optim..

[2]  Mallozzi Lina,et al.  HIERARCHICAL SYSTEMS WITH WEIGHTED REACTION SET , 1996 .

[3]  Dimitri P. Bertsekas,et al.  Constrained Optimization and Lagrange Multiplier Methods , 1982 .

[4]  J. Zico Kolter,et al.  What game are we playing? End-to-end learning in normal and extensive form games , 2018, IJCAI.

[5]  Stefano Coniglio,et al.  Methods for Finding Leader-Follower Equilibria with Multiple Followers: (Extended Abstract) , 2016, AAMAS.

[6]  Stefano Coniglio,et al.  Computing a Pessimistic Stackelberg Equilibrium with Multiple Followers: The Mixed-Pure Case , 2019, Algorithmica.

[7]  Milind Tambe,et al.  Defender (Mis)coordination in Security Games , 2013, IJCAI.

[8]  Dale T. Mortensen,et al.  Abstract Acknowledgements Taxes, Subsidies and Equilibrium Labor Market Outcomes , 2022 .

[9]  Thomas L. Magnanti,et al.  Sensitivity Analysis for Variational Inequalities , 1992, Math. Oper. Res..

[10]  Francesca Parise,et al.  A variational inequality framework for network games: Existence, uniqueness, convergence and sensitivity analysis , 2017, Games Econ. Behav..

[11]  Michael Wooldridge,et al.  Stackelberg Security Games with Multiple Uncoordinated Defenders , 2018, AAMAS.

[12]  Stefano Coniglio,et al.  Bilevel programming methods for computing single-leader-multi-follower equilibria in normal-form and polymatrix games , 2020, EURO J. Comput. Optim..

[13]  Per-Olov Johansson,et al.  Welfare evaluation of subsidies to renewable energy in general equilibrium: Theory and application , 2019, Energy Economics.

[14]  Zhengyuan Zhou,et al.  Learning in games with continuous action sets and unknown payoff functions , 2019, Math. Program..

[15]  Patrice Marcotte,et al.  An overview of bilevel optimization , 2007, Ann. Oper. Res..

[16]  J. Zico Kolter,et al.  OptNet: Differentiable Optimization as a Layer in Neural Networks , 2017, ICML.

[17]  Lorenz T. Biegler,et al.  Optimal sensitivity based on IPOPT , 2012, Math. Program. Comput..

[18]  Alexander S. Poznyak,et al.  Modeling Multileader–Follower Noncooperative Stackelberg Games , 2016, Cybern. Syst..

[19]  Martin Rotemberg,et al.  Equilibrium Effects of Firm Subsidies , 2019, American Economic Review.

[20]  Jing Yu,et al.  End-to-End Learning and Intervention in Games , 2020, NeurIPS.

[21]  D. Gleich TRUST REGION METHODS , 2017 .

[22]  Masao Fukushima,et al.  Variational Inequality Formulation of a Class of Multi-Leader-Follower Games , 2011, J. Optim. Theory Appl..

[23]  Alexander G. Gray,et al.  Stochastic Alternating Direction Method of Multipliers , 2013, ICML.

[24]  Tim Roughgarden,et al.  Stackelberg scheduling strategies , 2001, STOC '01.

[25]  Bo An,et al.  Stackelberg Security Games: Looking Beyond a Decade of Success , 2018, IJCAI.

[26]  Vladlen Koltun,et al.  Deep Equilibrium Models , 2019, NeurIPS.

[27]  Milind Tambe,et al.  Solving Structured Hierarchical Games Using Differential Backward Induction , 2021, ArXiv.

[28]  Stephen P. Boyd,et al.  Differentiable Convex Optimization Layers , 2019, NeurIPS.

[29]  Jie Lu,et al.  The Kth-Best Approach for Linear Bilevel Multi-follower Programming , 2005, J. Glob. Optim..

[30]  Zhu Han,et al.  A Multi-Leader Multi-Follower Stackelberg Game for Resource Management in LTE Unlicensed , 2017, IEEE Transactions on Wireless Communications.

[31]  Bo An,et al.  Deploying PAWS: Field Optimization of the Protection Assistant for Wildlife Security , 2016, AAAI.

[32]  Takashi Ui,et al.  Bayesian Nash equilibrium and variational inequalities , 2016 .

[33]  Tomoya Nakamura,et al.  One-Leader and Multiple-Follower Stackelberg Games with Private Information , 2014 .

[34]  Vincent Conitzer,et al.  Complexity of Computing Optimal Stackelberg Strategies in Security Resource Allocation Games , 2010, AAAI.

[35]  Jorge Nocedal,et al.  Knitro: An Integrated Package for Nonlinear Optimization , 2006 .

[36]  Patrick T. Harker,et al.  Finite-dimensional variational inequality and nonlinear complementarity problems: A survey of theory, algorithms and applications , 1990, Math. Program..

[37]  Tinne Hoff Kjeldsen,et al.  Traces and Emergence of Nonlinear Programming , 2013 .

[38]  Stan Uryasev,et al.  Relaxation algorithms to find Nash equilibria with economic applications , 2000 .

[39]  R. Rubinstein,et al.  On relaxation algorithms in computation of noncooperative equilibria , 1994, IEEE Trans. Autom. Control..

[40]  Kalyanmoy Deb,et al.  Using Karush-Kuhn-Tucker proximity measure for solving bilevel optimization problems , 2019, Swarm Evol. Comput..

[41]  Mohammad Taghi Hajiaghayi,et al.  Computing Stackelberg Equilibria of Large General-Sum Games , 2019, SAGT.

[42]  Francisco Facchinei,et al.  Solving quasi-variational inequalities via their KKT conditions , 2014, Math. Program..

[43]  Duan Li,et al.  On the Convergence of Augmented Lagrangian Methods for Constrained Global Optimization , 2007, SIAM J. Optim..

[44]  Kalyanmoy Deb,et al.  Finding optimal strategies in a multi-period multi-leader-follower Stackelberg game using an evolutionary algorithm , 2013, Comput. Oper. Res..

[45]  Bethany L. Nicholson,et al.  Mathematical Programs with Equilibrium Constraints , 2021, Pyomo — Optimization Modeling in Python.

[46]  Rainer Böhme,et al.  Security Games with Market Insurance , 2011, GameSec.

[47]  R. McKelvey,et al.  Quantal Response Equilibria for Normal Form Games , 1995 .

[48]  Baoding Liu,et al.  Stackelberg-Nash equilibrium for multilevel programming with multiple followers using genetic algorithms , 1998 .

[49]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .

[50]  Didier Aussel,et al.  A trilevel model for best response in energy demand-side management , 2020, Eur. J. Oper. Res..

[51]  J. Zico Kolter,et al.  Large Scale Learning of Agent Rationality in Two-Player Zero-Sum Games , 2019, AAAI.

[52]  Edith Elkind,et al.  Mechanism Design for Defense Coordination in Security Games , 2020, AAMAS.

[53]  Mingyan Liu,et al.  Voluntary Participation in Cyber-insurance Markets , 2014 .