Data-Driven Optimal Consensus Control for Discrete-Time Multi-Agent Systems With Unknown Dynamics Using Reinforcement Learning Method

This paper investigates the optimal consensus control problem for discrete-time multi-agent systems with completely unknown dynamics by utilizing a data-driven reinforcement learning method. It is known that the optimal consensus control for multi-agent systems relies on the solution of the coupled Hamilton–Jacobi–Bellman equation, which is generally impossible to be solved analytically. Even worse, most real-world systems are too complicated to obtain accurate mathematical models. To overcome these deficiencies, a data-based adaptive dynamic programming method is presented using the current and past system data rather than the accurate system models also instead of the traditional identification scheme which would cause the approximation residual errors. First, we establish a discounted performance index and formulate the optimal consensus problem via Bellman optimality principle. Then, we introduce the policy iteration algorithm which motivates this paper. To implement the proposed online action-dependent heuristic dynamic programming method, two neural networks (NNs), 1) critic NN and 2) actor NN, are employed to approximate the iterative performance index functions and control policies, respectively, in real time. Finally, two simulation examples are provided to demonstrate the effectiveness of the proposed method.

[1]  Frank L. Lewis,et al.  Online actor critic algorithm to solve the continuous-time infinite horizon optimal control problem , 2009, 2009 International Joint Conference on Neural Networks.

[2]  Frank L. Lewis,et al.  Multi-agent discrete-time graphical games: interactive Nash equilibrium and value iteration solution , 2013, 2013 American Control Conference.

[3]  Derong Liu,et al.  Policy Iteration Adaptive Dynamic Programming Algorithm for Discrete-Time Nonlinear Systems , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[4]  Huaguang Zhang,et al.  A Novel Infinite-Time Optimal Tracking Control Scheme for a Class of Discrete-Time Nonlinear Systems via the Greedy HDP Iteration Algorithm , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[5]  Frank L. Lewis,et al.  Discrete-time dynamic graphical games: model-free reinforcement learning solution , 2015 .

[6]  Derong Liu,et al.  Neural-network-based zero-sum game for discrete-time nonlinear systems via iterative adaptive dynamic programming algorithm , 2013, Neurocomputing.

[7]  Richard M. Murray,et al.  Consensus problems in networks of agents with switching topology and time-delays , 2004, IEEE Transactions on Automatic Control.

[8]  Derong Liu,et al.  Finite-horizon neuro-optimal tracking control for a class of discrete-time nonlinear systems using adaptive dynamic programming approach , 2012, Neurocomputing.

[9]  Frank L. Lewis,et al.  Multi-agent differential graphical games: Nash online adaptive learning solutions , 2013, 52nd IEEE Conference on Decision and Control.

[10]  Frank L. Lewis,et al.  Online solution of nonlinear two-player zero-sum games using synchronous policy iteration , 2010, 49th IEEE Conference on Decision and Control (CDC).

[11]  Frank L. Lewis,et al.  Optimal distributed synchronization control for continuous-time heterogeneous multi-agent differential graphical games , 2015, Inf. Sci..

[12]  Wenwu Yu,et al.  An Overview of Recent Progress in the Study of Distributed Multi-Agent Coordination , 2012, IEEE Transactions on Industrial Informatics.

[13]  Jean-Jacques E. Slotine,et al.  Neural Network Control of Unknown Nonlinear Systems , 1989, 1989 American Control Conference.

[14]  Zhongsheng Hou,et al.  Controller-Dynamic-Linearization-Based Model Free Adaptive Control for Discrete-Time Nonlinear Systems , 2013, IEEE Transactions on Industrial Informatics.

[15]  E.M. Atkins,et al.  A survey of consensus problems in multi-agent coordination , 2005, Proceedings of the 2005, American Control Conference, 2005..

[16]  Zhihong Man,et al.  Robust Finite-Time Consensus Tracking Algorithm for Multirobot Systems , 2009, IEEE/ASME Transactions on Mechatronics.

[17]  Huaguang Zhang,et al.  Near-Optimal Control for Nonzero-Sum Differential Games of Continuous-Time Nonlinear Systems Using Single-Network ADP , 2013, IEEE Transactions on Cybernetics.

[18]  Frank L. Lewis,et al.  Policy Iterations on the Hamilton–Jacobi–Isaacs Equation for $H_{\infty}$ State Feedback Control With Input Saturation , 2006, IEEE Transactions on Automatic Control.

[19]  Derong Liu,et al.  Neural-Network-Based Optimal Control for a Class of Unknown Discrete-Time Nonlinear Systems Using Globalized Dual Heuristic Programming , 2012, IEEE Transactions on Automation Science and Engineering.

[20]  Derong Liu,et al.  Data-Driven Neuro-Optimal Temperature Control of Water–Gas Shift Reaction Using Stable Iterative Adaptive Dynamic Programming , 2014, IEEE Transactions on Industrial Electronics.

[21]  Huijun Gao,et al.  Coordination for Linear Multiagent Systems With Dynamic Interaction Topology in the Leader-Following Framework , 2014, IEEE Transactions on Industrial Electronics.

[22]  Feng Liu,et al.  A boundedness result for the direct heuristic dynamic programming , 2012, Neural Networks.

[23]  Frank L. Lewis,et al.  Multi-player non-zero-sum games: Online adaptive learning solution of coupled Hamilton-Jacobi equations , 2011, Autom..

[24]  Frank L. Lewis,et al.  Actor–Critic-Based Optimal Tracking for Partially Unknown Nonlinear Discrete-Time Systems , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[25]  Frank L. Lewis,et al.  Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach , 2005, Autom..

[26]  Frank L. Lewis,et al.  Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning , 2014, Autom..

[27]  Huaguang Zhang,et al.  Distributed Cooperative Optimal Control for Multiagent Systems on Directed Graphs: An Inverse Optimal Approach , 2015, IEEE Transactions on Cybernetics.

[28]  Frank L. Lewis,et al.  Multi-agent discrete-time graphical games and reinforcement learning solutions , 2014, Autom..

[29]  Richard M. Murray,et al.  INFORMATION FLOW AND COOPERATIVE CONTROL OF VEHICLE FORMATIONS , 2002 .

[30]  Sarangapani Jagannathan,et al.  Optimal control of unknown affine nonlinear discrete-time systems using offline-trained neural networks with proof of convergence , 2009, Neural Networks.

[31]  Frank L. Lewis,et al.  Multi-agent differential graphical games , 2011, Proceedings of the 30th Chinese Control Conference.

[32]  Haibo He,et al.  Optimal Control for Unknown Discrete-Time Nonlinear Markov Jump Systems Using Adaptive Dynamic Programming , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[33]  Xinghuo Yu,et al.  Flocking of Multi-Agent Non-Holonomic Systems With Proximity Graphs , 2013, IEEE Transactions on Circuits and Systems I: Regular Papers.

[34]  Stef Tijs,et al.  Introduction to Game Theory , 2003 .

[35]  Frank L. Lewis,et al.  Online solution of nonlinear two‐player zero‐sum games using synchronous policy iteration , 2012 .

[36]  Frank L. Lewis,et al.  Adaptive Optimal Control of Unknown Constrained-Input Systems Using Policy Iteration and Neural Networks , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[37]  Jennie Si,et al.  Online learning control by association and reinforcement , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[38]  Huaguang Zhang,et al.  Neural-Network-Based Near-Optimal Control for a Class of Discrete-Time Affine Nonlinear Systems With Control Constraints , 2009, IEEE Transactions on Neural Networks.

[39]  Huaguang Zhang,et al.  Leader-Based Optimal Coordination Control for the Consensus Problem of Multiagent Differential Games via Fuzzy Adaptive Dynamic Programming , 2015, IEEE Transactions on Fuzzy Systems.

[40]  Bin Wang,et al.  Dual Heuristic dynamic Programming for nonlinear discrete-time uncertain systems with state delay , 2014, Neurocomputing.

[41]  Shangtai Jin,et al.  Data-Driven Model-Free Adaptive Control for a Class of MIMO Nonlinear Discrete-Time Systems , 2011, IEEE Transactions on Neural Networks.

[42]  Huaguang Zhang,et al.  Neural-Network-Based Constrained Optimal Control Scheme for Discrete-Time Switched Nonlinear System Using Dual Heuristic Programming , 2014, IEEE Transactions on Automation Science and Engineering.

[43]  Robert Kozma,et al.  Complete stability analysis of a heuristic approximate dynamic programming control design , 2015, Autom..

[44]  Derong Liu,et al.  An Optimal Control Scheme for a Class of Discrete-time Nonlinear Systems with Time Delays Using Adaptive Dynamic Programming , 2010 .

[45]  Frank L. Lewis,et al.  Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems , 2014, Autom..

[46]  Frank L. Lewis,et al.  $ {H}_{ {\infty }}$ Tracking Control of Completely Unknown Continuous-Time Systems via Off-Policy Reinforcement Learning , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[47]  Derong Liu,et al.  Multibattery Optimal Coordination Control for Home Energy Management Systems via Distributed Iterative Adaptive Dynamic Programming , 2015, IEEE Transactions on Industrial Electronics.

[48]  James Lam,et al.  Further Results on Exponential Estimates of Markovian Jump Systems With Mode-Dependent Time-Varying Delays , 2011, IEEE Transactions on Automatic Control.

[49]  Huaguang Zhang,et al.  An Optimal Control Scheme for a Class of Discrete-time Nonlinear Systems with Time Delays Using Adap , 2010 .

[50]  Richard R. Brooks,et al.  Distributed Sensor Networks: A Multiagent Perspective , 2008 .

[51]  Luigi Fortuna,et al.  Reinforcement Learning and Adaptive Dynamic Programming for Feedback Control , 2009 .