Multi-Agent Synchronization Using Online Model-Free Action Dependent Dual Heuristic Dynamic Programming Approach

Approximate dynamic programming platforms are employed to solve dynamic graphical games, where the agents interact among each other using communication graphs in order to achieve synchronization. Although the action dependent dual heuristic dynamic programming schemes provide fast solution platforms for several control problems, their capabilities degrade for systems with unknown or uncertain dynamical models. An online model-free adaptive learning solution based on action dependent dual heuristic dynamic programming is proposed to solve the dynamic graphical games. It employs distributed actor-critic neural networks to approximate the optimal value function and the associated model-free control strategy for each agent. This is done using a policy iteration process where it does not employ any extensive computational effort, as traditionally observed. The duality between the model-free coupled Bellman optimality equation and the underlying coupled Riccati equation is highlighted. This is followed by a graph simulation scenario to test the usefulness of the proposed policy iteration process.

[1]  Frank L. Lewis,et al.  Dynamic graphical games: Online adaptive learning solutions using approximate dynamic programming , 2014 .

[2]  Frank L. Lewis,et al.  Model-Free Gradient-Based Adaptive Learning Controller for an Unmanned Flexible Wing Aircraft , 2018, Robotics.

[3]  John N. Tsitsiklis,et al.  Neuro-dynamic programming: an overview , 1995, Proceedings of 1995 34th IEEE Conference on Decision and Control.

[4]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[5]  Frank L. Lewis,et al.  Multi-agent discrete-time graphical games and reinforcement learning solutions , 2014, Autom..

[6]  Richard M. Murray,et al.  Consensus problems in networks of agents with switching topology and time-delays , 2004, IEEE Transactions on Automatic Control.

[7]  R. Bellman,et al.  Dynamic Programming and Markov Processes , 1960 .

[8]  Richard M. Murray,et al.  INFORMATION FLOW AND COOPERATIVE CONTROL OF VEHICLE FORMATIONS , 2002 .

[9]  Z. Qu,et al.  Cooperative Control of Dynamical Systems: Applications to Autonomous Vehicles , 2009 .

[10]  Mohammed Abouheaf,et al.  Reinforcement Learning Solution with Costate Approximation for a Flexible Wing Aircraft , 2018, 2018 IEEE International Conference on Computational Intelligence and Virtual Environments for Measurement Systems and Applications (CIVEMSA).

[11]  J. A. Bryson Optimal control-1950 to 1985 , 1996 .

[12]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[13]  Wail Gueaieb,et al.  Model-Free Adaptive Learning Control Scheme for Wind Turbines with Doubly Fed Induction Generators , 2018 .

[14]  Reza Olfati-Saber,et al.  Consensus and Cooperation in Networked Multi-Agent Systems , 2007, Proceedings of the IEEE.

[15]  Warren B. Powell,et al.  Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.

[16]  L. Buşoniu,et al.  A comprehensive survey of multi-agent reinforcement learning , 2011 .

[17]  Bernard Widrow,et al.  Punish/Reward: Learning with a Critic in Adaptive Threshold Systems , 1973, IEEE Trans. Syst. Man Cybern..

[18]  Mohammed Abouheaf,et al.  Model-Free Value Iteration Solution for Dynamic Graphical Games , 2018, 2018 IEEE International Conference on Computational Intelligence and Virtual Environments for Measurement Systems and Applications (CIVEMSA).

[19]  Frank L. Lewis,et al.  Discrete-time dynamic graphical games: model-free reinforcement learning solution , 2015 .

[20]  Magdi S. Mahmoud,et al.  Adaptive critics based cooperative control scheme for islanded Microgrids , 2018, Neurocomputing.

[21]  Randy Beard,et al.  Information consensus in distributed multiple vehicle coordinated control , 2003, 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475).

[22]  K. Hunt APPLIED OPTIMAL CONTROL AND ESTIMATION : DIGITAL DESIGN AND IMPLEMENTATION , 1993 .

[23]  Frank L. Lewis,et al.  Approximate dynamic programming solutions of multi-agent graphical games using actor-critic network structures , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[24]  Jie Lin,et al.  Coordination of groups of mobile autonomous agents using nearest neighbor rules , 2003, IEEE Trans. Autom. Control..

[25]  Paul J. Werbos,et al.  Approximate dynamic programming for real-time control and neural modeling , 1992 .

[26]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[27]  Mohammed Abouheaf,et al.  Multi-agent reinforcement learning approach based on reduced value function approximations , 2017, 2017 IEEE International Symposium on Robotics and Intelligent Sensors (IRIS).

[28]  Frank L. Lewis,et al.  Q-Learning with Eligibility Traces to Solve Non-Convex Economic Dispatch Problems , 2013 .

[29]  Zhihong Man,et al.  Robust Finite-Time Consensus Tracking Algorithm for Multirobot Systems , 2009, IEEE/ASME Transactions on Mechatronics.