Learning to Communicate: A Machine Learning Framework for Heterogeneous Multi-Agent Robotic Systems

We present a machine learning framework for multi-agent systems to learn both the optimal policy for maximizing the rewards and the encoding of the high dimensional visual observation. The encoding is useful for sharing local visual observations with other agents under communication resource constraints. The actor-encoder encodes the raw images and chooses an action based on local observations and messages sent by the other agents. The machine learning agent generates not only an actuator command to the physical device, but also a communication message to the other agents. We formulate a reinforcement learning problem, which extends the action space to consider the communication action as well. The feasibility of the reinforcement learning framework is demonstrated using a 3D simulation environment with two collaborating agents. The environment provides realistic visual observations to be used and shared between the two agents.

[1]  Martin A. Riedmiller,et al.  Autonomous reinforcement learning on raw visual input data in a real world application , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[2]  Marwan Mattar,et al.  Unity: A General Platform for Intelligent Agents , 2018, ArXiv.

[3]  Guillaume J. Laurent,et al.  Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems , 2012, The Knowledge Engineering Review.

[4]  I. Kaminer,et al.  Time-Critical Cooperative Control of Multiple Autonomous Vehicles: Robust Distributed Strategies for Path-Following Control and Time-Coordination over Dynamic Communications Networks , 2012, IEEE Control Systems.

[5]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[6]  Shimon Whiteson,et al.  Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[7]  Naira Hovakimyan,et al.  Time-Critical Cooperative Control of Autonomous Air Vehicles , 2017 .

[8]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[9]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[10]  H. Kushner,et al.  Stochastic Approximation and Recursive Algorithms and Applications , 2003 .

[11]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[12]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[13]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[14]  Peter Stone,et al.  Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[15]  Xiangyu Liu,et al.  ACCNet: Actor-Coordinator-Critic Net for "Learning-to-Communicate" with Deep Multi-agent Reinforcement Learning , 2017, ArXiv.

[16]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[17]  Guillaume Lample,et al.  Playing FPS Games with Deep Reinforcement Learning , 2016, AAAI.

[18]  Amnon Shashua,et al.  Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving , 2016, ArXiv.

[19]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[20]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[21]  David Silver,et al.  Memory-based control with recurrent neural networks , 2015, ArXiv.

[22]  Naira Hovakimyan,et al.  Hidden Markov Model Estimation-Based Q-learning for Partially Observable Markov Decision Process , 2018, 2019 American Control Conference (ACC).

[23]  John N. Tsitsiklis,et al.  Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.

[24]  John N. Tsitsiklis,et al.  Actor-Critic Algorithms , 1999, NIPS.