Approximate optimal cooperative decentralized control for consensus in a topological network of agents with uncertain nonlinear dynamics

Efforts in this paper seek to combine graph theory with adaptive dynamic programming (ADP) as a reinforcement learning (RL) framework to determine forward-in-time, real-time, approximate optimal controllers for distributed multi-agent systems with uncertain nonlinear dynamics. A decentralized continuous time-varying control strategy is proposed, using only local communication feedback from two-hop neighbors on a communication topology that has a spanning tree. An actor-critic-identifier architecture is proposed that employs a nonlinear state derivative estimator to estimate the unknown dynamics online and uses the estimate thus obtained for value function approximation.

[1]  Frank L. Lewis,et al.  Online actor critic algorithm to solve the continuous-time infinite horizon optimal control problem , 2009, 2009 International Joint Conference on Neural Networks.

[2]  Frank L. Lewis,et al.  Integral Reinforcement Learning for online computation of feedback Nash strategies of nonzero-sum differential games , 2010, 49th IEEE Conference on Decision and Control (CDC).

[3]  Frank L. Lewis,et al.  Neuro-Fuzzy Control of Industrial Systems with Actuator Nonlinearities , 1987 .

[4]  Martin Lauer,et al.  An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems , 2000, ICML.

[5]  Marcus Johnson,et al.  Nonlinear two-player zero-sum game approximate solution using a Policy Iteration algorithm , 2011, IEEE Conference on Decision and Control and European Control Conference.

[6]  Gang Chen,et al.  Optimal Tracking Agent: A New Framework for Multi-agent Reinforcement Learning , 2011, 2011IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications.

[7]  Frank L. Lewis,et al.  Online learning algorithm for Stackelberg games in problems with hierarchy , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[8]  Randal W. Beard,et al.  Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation , 1997, Autom..

[9]  Warren E. Dixon,et al.  Nonlinear Control of Engineering Systems , 2002 .

[10]  Frank L. Lewis,et al.  Distributed Adaptive Tracking Control for Synchronization of Unknown Networked Lagrangian Systems , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[11]  Zhihong Man,et al.  Robust Finite-Time Consensus Tracking Algorithm for Multirobot Systems , 2009, IEEE/ASME Transactions on Mechatronics.

[12]  Ming Xin,et al.  Distributed optimal cooperative tracking control of multiple autonomous robots , 2012, Robotics Auton. Syst..

[13]  Gerhard Weiß,et al.  Distributed reinforcement learning , 1995, Robotics Auton. Syst..

[14]  Riccardo Scattolini,et al.  Stabilizing decentralized model predictive control of nonlinear systems , 2006, Autom..

[15]  Frank L. Lewis,et al.  Multi-player non-zero-sum games: Online adaptive learning solution of coupled Hamilton-Jacobi equations , 2011, Autom..

[16]  Frank L. Lewis,et al.  Multi-agent differential graphical games , 2011, Proceedings of the 30th Chinese Control Conference.

[17]  Kurt Hornik,et al.  Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks , 1990, Neural Networks.

[18]  Frank L. Lewis,et al.  A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems , 2013, Autom..

[19]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[20]  Ming Xin,et al.  Multi-agent consensus algorithm with obstacle avoidance via optimal control approach , 2011, Proceedings of the 2011 American Control Conference.

[21]  Elham Semsar-Kazerooni,et al.  Optimal consensus algorithms for cooperative team of agents subject to partial information , 2008, Autom..

[22]  S. Shankar Sastry,et al.  Decentralized nonlinear model predictive control of multiple flying robots , 2003, 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475).

[23]  Petros A. Ioannou,et al.  Robust Adaptive Control , 2012 .

[24]  Paul J. Werbos,et al.  Approximate dynamic programming for real-time control and neural modeling , 1992 .

[25]  Marcus Johnson,et al.  Asymptotic stackelberg optimal control design for an uncertain Euler Lagrange system , 2010, 49th IEEE Conference on Decision and Control (CDC).

[26]  Richard M. Johnstone,et al.  Exponential convergence of recursive least squares with exponential forgetting factor , 1982, 1982 21st IEEE Conference on Decision and Control.

[27]  Donald A. Sofge,et al.  Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches , 1992 .

[28]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.