Multi-agent reinforcement learning approach based on reduced value function approximations

This paper introduces novel online adaptive Reinforcement Learning approach based on Policy Iteration for multi-agent systems interacting on graphs. The approach uses reduced value functions to solve the coupled Bellman and Hamilton-Jacobi-Bellman equations for multi-agent systems. This done using only partial knowledge about the agents' dynamics. The convergence of the approach is shown to depend on the properties of the communication graph. The Policy Iteration approach is implemented in real-time using neural networks, where reduced value functions are considered to reduce the computational complexity.

[1]  Sandip Sen,et al.  Learning in multiagent systems , 1999 .

[2]  Frank L. Lewis,et al.  Multi-agent discrete-time graphical games: interactive Nash equilibrium and value iteration solution , 2013, 2013 American Control Conference.

[3]  Randy Beard,et al.  Information consensus in distributed multiple vehicle coordinated control , 2003, 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475).

[4]  Ιωαννησ Τσιτσικλησ,et al.  PROBLEMS IN DECENTRALIZED DECISION MAKING AND COMPUTATION , 1984 .

[5]  Zhihong Man,et al.  Robust Finite-Time Consensus Tracking Algorithm for Multirobot Systems , 2009, IEEE/ASME Transactions on Mechatronics.

[6]  Paul J. Werbos,et al.  Approximate dynamic programming for real-time control and neural modeling , 1992 .

[7]  Magdi S. Mahmoud,et al.  Online policy iteration solution for dynamic graphical games , 2016, 2016 13th International Multi-Conference on Systems, Signals & Devices (SSD).

[8]  Zhong-Ping Jiang,et al.  Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics , 2012, Autom..

[9]  Kevin L. Moore,et al.  High-Order and Model Reference Consensus Algorithms in Cooperative Control of MultiVehicle Systems , 2007 .

[10]  C. Atkeson,et al.  Minimax differential dynamic programming: application to a biped walking robot , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[11]  Lin Huang,et al.  Consensus of Multiagent Systems and Synchronization of Complex Networks: A Unified Viewpoint , 2016, IEEE Transactions on Circuits and Systems I: Regular Papers.

[12]  Paul J. Werbos,et al.  Neural networks for control and system identification , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.

[13]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[14]  Frank L. Lewis,et al.  Optimal Control: Lewis/Optimal Control 3e , 2012 .

[15]  Frank L. Lewis,et al.  Approximate dynamic programming solutions of multi-agent graphical games using actor-critic network structures , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[16]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[17]  Frank L. Lewis,et al.  Discrete-time dynamic graphical games: model-free reinforcement learning solution , 2015 .

[18]  Guanrong Chen,et al.  Pinning a complex dynamical network to its equilibrium , 2004, IEEE Transactions on Circuits and Systems I: Regular Papers.

[19]  S. Lall,et al.  Discrete variational Hamiltonian mechanics , 2006 .

[20]  Bernard Widrow,et al.  Punish/Reward: Learning with a Critic in Adaptive Threshold Systems , 1973, IEEE Trans. Syst. Man Cybern..

[21]  Zhong-Ping Jiang,et al.  Global Adaptive Dynamic Programming for Continuous-Time Nonlinear Systems , 2013, IEEE Transactions on Automatic Control.

[22]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[23]  Frank L. Lewis,et al.  Multi-agent discrete-time graphical games and reinforcement learning solutions , 2014, Autom..