Nash Equilibrium Seeking in Noncooperative Games

We introduce a non-model based approach for locally stable convergence to Nash equilibria in static, noncooperative games with N players. In classical game theory algorithms, each player employs the knowledge of the functional form of his payoff and the knowledge of the other players' actions, whereas in the proposed algorithm, the players need to measure only their own payoff values. This strategy is based on the extremum seeking approach, which has previously been developed for standard optimization problems and employs sinusoidal perturbations to estimate the gradient. We consider static games with quadratic payoff functions before generalizing our results to games with non-quadratic payoff functions that are the output of a dynamic system. Specifically, we consider general nonlinear differential equations with N inputs and N outputs, where in the steady state, the output signals represent the payoff functions of a noncooperative game played by the steady-state values of the input signals. We employ the standard local averaging theory and obtain local convergence results for both quadratic payoffs, where the actual convergence is semi-global, and non-quadratic payoffs, where the potential existence of multiple Nash equilibria precludes semi-global convergence. Our convergence conditions coincide with conditions that arise in model-based Nash equilibrium seeking. However, in our framework the user is not meant to check these conditions because the payoff functions are presumed to be unknown. For non-quadratic payoffs, convergence to a Nash equilibrium is not perfect, but is biased in proportion to the perturbation amplitudes and the higher derivatives of the payoff functions. We quantify the size of these residual biases and confirm their existence numerically in an example noncooperative game. In this example, we present the first application of extremum seeking with projection to ensure that the players' actions remain in a given closed and bounded action set.

[1]  Jeff S. Shamma,et al.  Dynamic fictitious play, dynamic gradient play, and distributed convergence to Nash equilibria , 2005, IEEE Transactions on Automatic Control.

[2]  Robert King,et al.  Adaptive Closed-Loop Separation Control on a High-Lift Configuration Using Extremum Seeking , 2006 .

[3]  Quanyan Zhu,et al.  Heterogeneous learning in zero-sum stochastic games with incomplete information , 2011, 49th IEEE Conference on Decision and Control (CDC).

[4]  Chris Manzie,et al.  Newton-like extremum-seeking part I: Theory , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.

[5]  E. Hopkins A Note on Best Response Dynamics , 1999 .

[6]  Jin Soo Lee,et al.  Extremum seeking control for discrete-time systems , 2002, IEEE Trans. Autom. Control..

[7]  Eugenio Schuster,et al.  Mixing enhancement in 2D magnetohydrodynamic channel flow by extremum seeking boundary control , 2009, 2009 American Control Conference.

[8]  David G. Luenberger,et al.  Complete stability of noncooperative games , 1978 .

[9]  Kartik B. Ariyur,et al.  Real-Time Optimization by Extremum-Seeking Control , 2003 .

[10]  Milos S. Stankovic,et al.  Extremum seeking under stochastic noise and applications to mobile sensors , 2010, Autom..

[11]  A. Astolfi,et al.  A new extremum seeking technique and its application to maximize RF heating on FTU , 2009 .

[12]  Minghui Zhu,et al.  Distributed coverage games for mobile visual sensor networks , 2010, 1002.0367.

[13]  Miroslav Krstic,et al.  Source seeking with non-holonomic unicycle without position measurement and with tuning of forward velocity , 2007, Syst. Control. Lett..

[14]  Ahmad Naimzada,et al.  Oligopoly games with nonlinear demand and cost functions: Two boundedly rational adjustment processes , 2006 .

[15]  J. Hofbauer,et al.  BEST RESPONSE DYNAMICS FOR CONTINUOUS ZERO{SUM GAMES , 2005 .

[16]  Miroslav Krstic,et al.  Nonholonomic Source Seeking With Tuning of Angular Velocity , 2009, IEEE Transactions on Automatic Control.

[17]  Miroslav Krstic,et al.  Nash equilibrium seeking for games with non-quadratic payoffs , 2010, 49th IEEE Conference on Decision and Control (CDC).

[18]  CONTROL AND GAME-THEORETIC TOOLS FOR COMMUNICATION NETWORKS SURVEY , 2022 .

[19]  Jason R. Marden,et al.  Cooperative Control and Potential Games , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[20]  Laura Giarré,et al.  Consensus in Noncooperative Dynamic Games: A Multiretailer Inventory Application , 2008, IEEE Transactions on Automatic Control.

[21]  L. Shapley,et al.  Potential Games , 1994 .

[22]  Khashayar Khorasani,et al.  Multi-agent team cooperation: A game theory approach , 2009, Autom..

[23]  Singiresu S. Rao,et al.  Game theory approach for the integrated design of structures and controls , 1988 .

[24]  O. Taussky A Recurring Theorem on Determinants , 1949 .

[25]  Rajneesh Sharma,et al.  Synergizing reinforcement learning and game theory - A new direction for control , 2010, Appl. Soft Comput..

[26]  Ying Tan,et al.  A unifying approach to extremum seeking: Adaptive schemes based on estimation of derivatives , 2010, 49th IEEE Conference on Decision and Control (CDC).

[27]  A. Teel,et al.  Semi-global practical asymptotic stability and averaging , 1999 .

[28]  Gunes Ercal,et al.  On No-Regret Learning, Fictitious Play, and Nash Equilibrium , 2001, ICML.

[29]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[30]  Miroslav Krstic,et al.  Nash equilibrium seeking with finitely- and infinitely-many players , 2010 .

[31]  Milos S. Stankovic,et al.  Distributed seeking of Nash equilibria in mobile sensor networks , 2010, 49th IEEE Conference on Decision and Control (CDC).

[32]  Augustin M. Cournot Cournot, Antoine Augustin: Recherches sur les principes mathématiques de la théorie des richesses , 2019, Die 100 wichtigsten Werke der Ökonomie.

[33]  D. Fudenberg,et al.  The Theory of Learning in Games , 1998 .

[34]  Miroslav Krstic,et al.  Real-Time Optimization by Extremum-Seeking Control: Ariyur/Extremum Seeking , 2004 .

[35]  L. Shapley,et al.  REGULAR ARTICLEPotential Games , 1996 .

[36]  Miroslav Krstic,et al.  Source Seeking for Two Nonholonomic Models of Fish Locomotion , 2009, IEEE Transactions on Robotics.

[37]  Sergio Barbarossa,et al.  IEEE TRANSACTIONS ON SIGNAL PROCESSING (ACCEPTED) 1 The MIMO Iterative Waterfilling Algorithm , 2022 .

[38]  H. Piaggio Mathematical Analysis , 1955, Nature.

[39]  Denis Dochain,et al.  Adaptive extremum-seeking control of nonisothermal continuous stirred tank reactors , 2005 .

[40]  Miroslav Krstic,et al.  Extremum seeking for limit cycle minimization , 2000, IEEE Trans. Autom. Control..

[41]  T. Başar,et al.  Dynamic Noncooperative Game Theory , 1982 .

[42]  Miroslav Krstic,et al.  HCCI Engine Combustion-Timing Control: Optimizing Gains and Fuel Consumption Via Extremum Seeking , 2009, IEEE Transactions on Control Systems Technology.

[43]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[44]  Tamer Basar,et al.  Distributed algorithms for the computation of noncooperative equilibria , 1987, Autom..

[45]  Anna G. Stefanopoulou,et al.  Extremum seeking control for soft landing of an electromechanical valve actuator , 2004, Autom..

[46]  Eitan Altman,et al.  Nash equilibria for combined flow control and routing in networks: asymptotic behavior for a large number of users , 2002, IEEE Trans. Autom. Control..

[47]  Ying Tan,et al.  On non-local stability properties of extremum seeking control , 2006, Autom..

[48]  J. Goodman Note on Existence and Uniqueness of Equilibrium Points for Concave N-Person Games , 1965 .

[49]  Stephen B. Wicker,et al.  Game theory and the design of self-configuring, adaptive wireless networks , 2001, IEEE Commun. Mag..

[50]  Dean Phillips Foster,et al.  Regret Testing: Learning to Play Nash Equilibrium Without Knowing You Have an Opponent , 2006 .