Action Dependent Dual Heuristic Programming Solution for the Dynamic Graphical Games

The context of graphical games is employed to solve the cooperative control problem for multi-agent systems interacting on graphs. Together with the need to have faster solution mechanisms urged for new approaches that employ the Dual Heuristic and Action Dependent Dual Heuristic Programming. This class of gradient-based solutions undergoes two main challenges. First, they have to use complex update expressions for the solving gradient-based structures. Second, they may overlook the local neighborhood information, if simpler costate expressions are enforced. A novel approach based on Action Dependent Dual Heuristic Programming is developed to solve the dynamic graphical games and to handle the aforementioned concerns. This adaptive learning approach is implemented online using means of value iteration and neural networks. The approximation of the optimal policy does not have priori knowledge about the agents' dynamics, while the value function gradient approximation is shown to depend only on the drift dynamics of the agents. The convergence results of the adaptive learning approach are highlighted by simulation example.

[1]  Randal W. Beard,et al.  Consensus seeking in multiagent systems under dynamically changing interaction topologies , 2005, IEEE Transactions on Automatic Control.

[2]  Richard M. Murray,et al.  INFORMATION FLOW AND COOPERATIVE CONTROL OF VEHICLE FORMATIONS , 2002 .

[3]  Paul J. Werbos,et al.  2009 Special Issue: Intelligence in the brain: A theory of how it works and how to build it , 2009 .

[4]  Frank L. Lewis,et al.  Approximate dynamic programming solutions of multi-agent graphical games using actor-critic network structures , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[5]  Sandip Sen,et al.  Learning in multiagent systems , 1999 .

[6]  Xiang Li,et al.  Pinning a complex dynamical network to its equilibrium , 2004, IEEE Trans. Circuits Syst. I Regul. Pap..

[7]  Kevin L. Moore,et al.  High-Order and Model Reference Consensus Algorithms in Cooperative Control of MultiVehicle Systems , 2007 .

[8]  Warren B. Powell,et al.  Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.

[9]  K. Hunt APPLIED OPTIMAL CONTROL AND ESTIMATION : DIGITAL DESIGN AND IMPLEMENTATION , 1993 .

[10]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[11]  Frank L. Lewis,et al.  Multi-agent discrete-time graphical games and reinforcement learning solutions , 2014, Autom..

[12]  Paul J. Werbos,et al.  Approximate dynamic programming for real-time control and neural modeling , 1992 .

[13]  S. Jagannathan,et al.  Optimal control of affine nonlinear continuous-time systems using an online Hamilton-Jacobi-Isaacs formulation , 2010, 49th IEEE Conference on Decision and Control (CDC).

[14]  Reza Olfati-Saber,et al.  Consensus and Cooperation in Networked Multi-Agent Systems , 2007, Proceedings of the IEEE.

[15]  Frank L. Lewis,et al.  Non-zero sum games: Online learning solution of coupled Hamilton-Jacobi and coupled Riccati equations , 2011, 2011 IEEE International Symposium on Intelligent Control.

[16]  Zhihong Man,et al.  Robust Finite-Time Consensus Tracking Algorithm for Multirobot Systems , 2009, IEEE/ASME Transactions on Mechatronics.

[17]  Michael L. Littman,et al.  Value-function reinforcement learning in Markov games , 2001, Cognitive Systems Research.