Approximate dynamic programming solutions of multi-agent graphical games using actor-critic network structures

This paper studies a new class of multi-agent discrete-time dynamical graphical games, where interactions between agents are restricted by a communication graph structure. The paper brings together discrete Hamiltonian mechanics, optimal control theory, cooperative control, game theory, reinforcement learning, and neural network structures to solve the multi-agent dynamical graphical games. Graphical game Bellman equations are derived and shown to be equivalent to certain graphical game Hamilton Jacobi Bellman equations developed herein. Reinforcement Learning techniques are used to solve these dynamical graphical games. Heuristic Dynamic Programming and Dual Heuristic Programming, are extended to solve the graphical games using only neighborhood information. Online adaptive learning structure is implemented using actor-critic networks to solve these graphical games.

[1]  Huaguang Zhang,et al.  Neural-Network-Based Near-Optimal Control for a Class of Discrete-Time Affine Nonlinear Systems With Control Constraints , 2009, IEEE Transactions on Neural Networks.

[2]  Paul J. Werbos,et al.  Neural networks for control and system identification , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.

[3]  Frank L. Lewis,et al.  Adaptive optimal control for continuous-time linear systems based on policy iteration , 2009, Autom..

[4]  K. Hunt APPLIED OPTIMAL CONTROL AND ESTIMATION : DIGITAL DESIGN AND IMPLEMENTATION , 1993 .

[5]  Leiba Rodman,et al.  Algebraic Riccati equations , 1995 .

[6]  J. Marsden,et al.  Discrete mechanics and variational integrators , 2001, Acta Numerica.

[7]  O. Gonzalez Time integration and discrete Hamiltonian systems , 1996 .

[8]  Zhihong Man,et al.  Robust Finite-Time Consensus Tracking Algorithm for Multirobot Systems , 2009, IEEE/ASME Transactions on Mechatronics.

[9]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[10]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[11]  Frank L. Lewis,et al.  Optimization and reinforcement learning techniques in multi-agent graphical games and economic dispatch , 2012 .

[12]  Randal W. Beard,et al.  Distributed Consensus in Multi-vehicle Cooperative Control - Theory and Applications , 2007, Communications and Control Engineering.

[13]  Frank L. Lewis,et al.  Multi-player non-zero-sum games: Online adaptive learning solution of coupled Hamilton-Jacobi equations , 2011, Autom..

[14]  Paul J. Werbos,et al.  2009 Special Issue: Intelligence in the brain: A theory of how it works and how to build it , 2009 .

[15]  Reza Olfati-Saber,et al.  Consensus and Cooperation in Networked Multi-Agent Systems , 2007, Proceedings of the IEEE.

[16]  Frank L. Lewis,et al.  Multi-agent differential graphical games , 2011, Proceedings of the 30th Chinese Control Conference.

[17]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[18]  Randy Beard,et al.  Information consensus in distributed multiple vehicle coordinated control , 2003, 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475).

[19]  Frank L. Lewis,et al.  Online actor critic algorithm to solve the continuous-time infinite horizon optimal control problem , 2009, 2009 International Joint Conference on Neural Networks.

[20]  P.J. Werbos,et al.  Using ADP to Understand and Replicate Brain Intelligence: the Next Level Design , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[21]  Guanrong Chen,et al.  Pinning control of scale-free dynamical networks , 2002 .

[22]  J. A. Bryson Optimal control-1950 to 1985 , 1996 .

[23]  Jiangping Hu,et al.  Tracking control for multi-agent consensus with an active leader and variable topology , 2006, Autom..

[24]  S. Lall,et al.  Discrete variational Hamiltonian mechanics , 2006 .

[25]  Sandip Sen,et al.  Evolution and learning in multiagent systems , 1998, Int. J. Hum. Comput. Stud..

[26]  Paul J. Werbos,et al.  Approximate dynamic programming for real-time control and neural modeling , 1992 .

[27]  Sandip Sen,et al.  Learning in multiagent systems , 1999 .