Mixed-Strategy Learning With Continuous Action Sets

Motivated by the recent applications of game-theoretical learning to the design of distributed control systems, we study a class of control problems that can be formulated as potential games with continuous action sets. We propose an actor-critic reinforcement learning algorithm that adapts mixed strategies over continuous action spaces. To analyze the algorithm, we extend the theory of finite-dimensional two-timescale stochastic approximation to a Banach space setting, and prove that the continuous dynamics of the process converge to equilibrium in the case of potential games. These results combine to give a provably-convergent learning algorithm in which players do not need to keep track of the controls selected by other agents.

[1]  David S. Leslie,et al.  Stochastic fictitious play with continuous action sets , 2014, J. Econ. Theory.

[2]  D. Fudenberg,et al.  The Theory of Learning in Games , 1998 .

[3]  Pierre Coucheney,et al.  Penalty-Regulated Dynamics and Robust Learning Procedures in Games , 2013, Math. Oper. Res..

[4]  Jason R. Marden,et al.  Achieving Pareto Optimality Through Distributed Learning , 2014, SIAM J. Control. Optim..

[5]  G. Scutari,et al.  Flexible design of cognitive radio wireless systems , 2009, IEEE Signal Processing Magazine.

[6]  David M. Kreps,et al.  Learning Mixed Equilibria , 1993 .

[7]  Walid Saad,et al.  Game-Theoretic Methods for the Smart Grid: An Overview of Microgrid Systems, Demand-Side Management, and Smart Grid Communications , 2012, IEEE Signal Processing Magazine.

[8]  Jason R. Marden,et al.  Payoff-Based Dynamics for Multiplayer Weakly Acyclic Games , 2009, SIAM J. Control. Optim..

[9]  Houyuan Jiang,et al.  Stochastic Approximation Approaches to the Stochastic Variational Inequality Problem , 2008, IEEE Transactions on Automatic Control.

[10]  A. Shwartz,et al.  Abstract stochastic approximations and applications , 1989 .

[11]  Aris L. Moustakas,et al.  Distributed Learning Policies for Power Allocation in Multiple Access Channels , 2011, IEEE Journal on Selected Areas in Communications.

[12]  Kagan Tumer,et al.  Optimal Payoff Functions for Members of Collectives , 2001, Adv. Complex Syst..

[13]  Asuman E. Ozdaglar,et al.  Near-Potential Games: Geometry and Dynamics , 2013, TEAC.

[14]  Arnaud Legrand,et al.  Toward a fully decentralized algorithm for multiple bag-of-tasks application scheduling on grids , 2008, 2008 9th IEEE/ACM International Conference on Grid Computing.

[15]  M. Hirsch,et al.  Mixed Equilibria and Dynamical Systems Arising from Fictitious Play in Perturbed Games , 1999 .

[16]  L. Shapley,et al.  Potential Games , 1994 .

[17]  Milos S. Stankovic,et al.  Distributed Seeking of Nash Equilibria With Applications to Mobile Sensor Networks , 2012, IEEE Transactions on Automatic Control.

[18]  William H. Sandholm,et al.  ON THE GLOBAL CONVERGENCE OF STOCHASTIC FICTITIOUS PLAY , 2002 .

[19]  V. Borkar Stochastic approximation with two time scales , 1997 .

[20]  Miroslav Krstic,et al.  Nash Equilibrium Seeking in Noncooperative Games , 2012, IEEE Transactions on Automatic Control.

[21]  E. Barron,et al.  Best response dynamics for continuous games , 2010 .

[22]  Tim Roughgarden,et al.  The price of stability for network design with fair cost allocation , 2004, 45th Annual IEEE Symposium on Foundations of Computer Science.

[23]  P. Moral,et al.  Sequential Monte Carlo samplers , 2002, cond-mat/0212648.

[24]  Josef Hofbauer,et al.  Brown-Von Neumann-Nash Dynamics: The Continuous Strategy Case , 2007, Games Econ. Behav..

[25]  Yurii Nesterov,et al.  Primal-dual subgradient methods for convex problems , 2005, Math. Program..

[26]  L. Shapley,et al.  REGULAR ARTICLEPotential Games , 1996 .

[27]  Sergio Barbarossa,et al.  Competitive Design of Multiuser MIMO Systems Based on Game Theory: A Unified View , 2008, IEEE Journal on Selected Areas in Communications.

[28]  Josef Hofbauer,et al.  Learning in perturbed asymmetric games , 2005, Games Econ. Behav..

[29]  F. Riedel,et al.  The Continuous Logit Dynamic and Price Dispersion , 2014 .

[30]  Miroslav Krstic,et al.  Stochastic Nash Equilibrium Seeking for Games with General Nonlinear Payoffs , 2011, SIAM J. Control. Optim..

[31]  Andrea J. Goldsmith,et al.  A Game-Theoretic Approach to Energy-Efficient Modulation in CDMA Networks with Delay Constraints , 2007, 2007 IEEE Radio and Wireless Symposium.

[32]  Angelia Nedic,et al.  Regularized Iterative Stochastic Approximation Methods for Stochastic Variational Inequality Problems , 2013, IEEE Transactions on Automatic Control.

[33]  Ross Cressman,et al.  Stability of the replicator equation with continuous strategy space , 2004, Math. Soc. Sci..

[34]  David S. Leslie,et al.  Individual Q-Learning in Normal Form Games , 2005, SIAM J. Control. Optim..

[35]  Jason R. Marden,et al.  Autonomous Vehicle-Target Assignment: A Game-Theoretical Formulation , 2007 .

[36]  Jason R. Marden,et al.  Designing Games for Distributed Optimization , 2013, IEEE J. Sel. Top. Signal Process..

[37]  E. J. Collins,et al.  Convergent multiple-timescales reinforcement learning algorithms in normal form games , 2003 .

[38]  Roberto Cominetti,et al.  Author's Personal Copy Games and Economic Behavior a Payoff-based Learning Procedure and Its Application to Traffic Games , 2022 .

[39]  M. Benaïm Dynamics of stochastic approximation algorithms , 1999 .