Multi-agent graphical games with input constraints: an online learning solution

This paper studies an online iterative algorithm for solving discrete-time multi-agent dynamic graphical games with input constraints. In order to obtain the optimal strategy of each agent, it is necessary to solve a set of coupled Hamilton-Jacobi-Bellman (HJB) equations. It is very difficult to solve HJB equations by the traditional method. The relevant game problem will become more complex if the control input of each agent in the dynamic graphical game is constrained. In this paper, an online iterative algorithm is proposed to find the online solution to dynamic graphical game without the need for drift dynamics of agents. Actually, this algorithm is to find the optimal solution of Bellman equations online. This solution employs a distributed policy iteration process, using only the local information available to each agent. It can be proved that under certain conditions, when each agent updates its own strategy simultaneously, the whole multi-agent system will reach Nash equilibrium. In the process of algorithm implementation, for each agent, two layers of neural networks are used to fit the value function and control strategy, respectively. Finally, a simulation example is given to show the effectiveness of our method.

[1]  Frank L. Lewis,et al.  Policy Iteration Solution for Differential Games with Constrained Control Policies , 2019, 2019 American Control Conference (ACC).

[2]  Frank L. Lewis,et al.  Stochastic Two-Player Zero-Sum Learning Differential Games , 2019, 2019 IEEE 15th International Conference on Control and Automation (ICCA).

[3]  Ji-Feng Zhang,et al.  Hierarchical Mean Field Games for Multiagent Systems With Tracking-Type Costs: Distributed $\varepsilon $-Stackelberg Equilibria , 2014, IEEE Transactions on Automatic Control.

[4]  Frank L. Lewis,et al.  Multi-player non-zero-sum games: Online adaptive learning solution of coupled Hamilton-Jacobi equations , 2011, Autom..

[5]  Frank L. Lewis,et al.  Actor–Critic-Based Optimal Tracking for Partially Unknown Nonlinear Discrete-Time Systems , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[6]  Frank L. Lewis,et al.  Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control , 2007, Autom..

[7]  Frank L. Lewis,et al.  Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach , 2005, Autom..

[8]  Richard M. Murray,et al.  Consensus problems in networks of agents with switching topology and time-delays , 2004, IEEE Transactions on Automatic Control.

[9]  Jiangping Hu,et al.  Tracking control for multi-agent consensus with an active leader and variable topology , 2006, Autom..

[10]  Zhihong Man,et al.  Robust Finite-Time Consensus Tracking Algorithm for Multirobot Systems , 2009, IEEE/ASME Transactions on Mechatronics.

[11]  Huanshui Zhang,et al.  Mean‐field games for multiagent systems with multiplicative noises , 2019, International Journal of Robust and Nonlinear Control.

[12]  Frank L. Lewis,et al.  Reinforcement learning and optimal adaptive control: An overview and implementation examples , 2012, Annu. Rev. Control..

[13]  Andrew E. B. Lim,et al.  Dynamic Mean-Variance Portfolio Selection with No-Shorting Constraints , 2001, SIAM J. Control. Optim..

[14]  Warren B. Powell,et al.  “Approximate dynamic programming: Solving the curses of dimensionality” by Warren B. Powell , 2007, Wiley Series in Probability and Statistics.

[15]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[16]  Frank L. Lewis,et al.  Optimal tracking control for linear discrete-time systems using reinforcement learning , 2013, 52nd IEEE Conference on Decision and Control.

[17]  F.L. Lewis,et al.  Reinforcement learning and adaptive dynamic programming for feedback control , 2009, IEEE Circuits and Systems Magazine.

[18]  Frank L. Lewis,et al.  Discrete-time dynamic graphical games: model-free reinforcement learning solution , 2015 .

[19]  Warren B. Powell,et al.  Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.

[20]  Frank L. Lewis,et al.  Optimal Control: Lewis/Optimal Control 3e , 2012 .

[21]  Frank L. Lewis,et al.  Multi-agent differential graphical games , 2011, Proceedings of the 30th Chinese Control Conference.

[22]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[23]  Frank L. Lewis,et al.  Multi-agent discrete-time graphical games and reinforcement learning solutions , 2014, Autom..

[24]  Xiangyu Cui,et al.  MEAN‐VARIANCE POLICY FOR DISCRETE‐TIME CONE‐CONSTRAINED MARKETS: TIME CONSISTENCY IN EFFICIENCY AND THE MINIMUM‐VARIANCE SIGNED SUPERMARTINGALE MEASURE , 2014, 1403.0718.

[25]  Frank L. Lewis,et al.  Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics , 2014, Autom..

[26]  Randy Beard,et al.  Information consensus in distributed multiple vehicle coordinated control , 2003, 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475).

[27]  Frank L. Lewis,et al.  Optimal Control , 1986 .