Compositionality of Linearly Solvable Optimal Control in Networked Multi-Agent Systems

In this paper, we discuss the methodology of generalizing the optimal control law from learned component tasks to unlearned composite tasks on Multi-Agent Systems (MASs), by using the linearity composition principle of linearly solvable optimal control (LSOC) problems. The proposed approach achieves both the compositionality and optimality of control actions simultaneously within the cooperative MAS framework in both discrete- and continuous-time in a sample-efficient manner, which reduces the burden of re-computation of the optimal control solutions for the new task on the MASs. We investigate the application of the proposed approach on the MAS with coordination between agents. The experiments show feasible results in investigated scenarios, including both discrete and continuous dynamical systems for task generalization without resampling.

[1]  Emanuel Todorov,et al.  Compositionality of optimal control laws , 2009, NIPS.

[2]  Tamer Basar,et al.  Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents , 2018, ICML.

[3]  Siddharth Mayya,et al.  The Robotarium: Globally Impactful Opportunities, Challenges, and Lessons Learned in Remote-Access, Distributed Control of Multirobot Systems , 2020, IEEE Control Systems.

[4]  David Silver,et al.  A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning , 2017, NIPS.

[5]  E. Todorov,et al.  A UNIFIED THEORY OF LINEARLY SOLVABLE OPTIMAL CONTROL , 2012 .

[6]  Tamer Basar,et al.  A game-theoretic formulation of multi-agent resource allocation , 2000, AGENTS '00.

[7]  E. Todorov,et al.  Linearly Solvable Optimal Control , 2013 .

[8]  Liu Ying,et al.  Intelligent Traffic Light Control Using Distributed Multi-agent Q Learning , 2017, ITSC 2017.

[9]  Milind Tambe,et al.  Taking DCOP to the real world: efficient complete solutions for distributed multi-event scheduling , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[10]  Stefan Schaal,et al.  Reinforcement learning of motor skills in high dimensions: A path integral approach , 2010, 2010 IEEE International Conference on Robotics and Automation.

[11]  Manuela M. Veloso,et al.  Multiagent Systems: A Survey from a Machine Learning Perspective , 2000, Auton. Robots.

[12]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[13]  Jonathan P. How,et al.  Decentralized control of partially observable Markov decision processes , 2015, 52nd IEEE Conference on Decision and Control.

[14]  Pravin Varaiya,et al.  Stochastic Systems: Estimation, Identification, and Adaptive Control , 1986 .

[15]  Lei Liu,et al.  Intelligent traffic light control using distributed multi-agent Q learning , 2017, 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC).

[16]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[17]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[18]  Emanuel Todorov,et al.  Efficient computation of optimal actions , 2009, Proceedings of the National Academy of Sciences.

[19]  John N. Tsitsiklis,et al.  Neuro-dynamic programming: an overview , 1995, Proceedings of 1995 34th IEEE Conference on Decision and Control.

[20]  Tamer Basar,et al.  Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms , 2019, Handbook of Reinforcement Learning and Control.

[21]  John N. Tsitsiklis,et al.  A survey of computational complexity results in systems and control , 2000, Autom..

[22]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[23]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[24]  Cheng-Lin Liu,et al.  Average-Consensus Tracking of Sensor Network via Distributed Coordination Control of Heterogeneous Multi-Agent Systems , 2019, IEEE Control Systems Letters.

[25]  Stefan Schaal,et al.  A Generalized Path Integral Control Approach to Reinforcement Learning , 2010, J. Mach. Learn. Res..

[26]  Jonathan P. How,et al.  Decentralized control of multi-robot partially observable Markov decision processes using belief space macro-actions , 2017, Int. J. Robotics Res..

[27]  François Charpillet,et al.  A heuristic approach for solving decentralized-POMDP: assessment on the pursuit problem , 2002, SAC '02.

[28]  Frank L. Lewis,et al.  Reinforcement Learning and Approximate Dynamic Programming for Feedback Control , 2012 .

[29]  Warrren B Powell,et al.  A review of stochastic algorithms with continuous value function approximation and some new approximate policy iteration algorithms for multidimensional continuous applications , 2011 .

[30]  Evangelos A. Theodorou,et al.  Sample Efficient Path Integral Control under Uncertainty , 2015, NIPS.