State Representation Learning for Multi-agent Deep Deterministic Policy Gradient

Multi-Agent Deterministic Policy Gradient (MADDPG) is a very useful algorithm in multi-agent domains. We analyze the algorithm and find that MADDPG uses deep neural networks (DNNs) as a Q function model. One advantage of DNNs is that they can build very complex processing functions to handle high-dimensional input. However, the disadvantage of this end-to-end learning is that it usually requires a lot of data, which is not always available for real-world control applications. In this paper, a new algorithm, State Representation Learning Multi-Agent Deep Deterministic Policy Gradient (SRL-MADPPG), is proposed that combines MADDPG with state representation learning which uses DNNs as a function fitting. i.e., model learning network is used to pre-train the first layer of the actor and critic networks, then the actor and critic learn from the state representation instead of the raw observations. Simulation result shows that the SRL-MADDPG algorithm improves the final performance in comparison with the end-to-end learning.

[1]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[2]  Yajie Miao,et al.  EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[3]  Chrystopher L. Nehaniv,et al.  Empowerment: a universal agent-centric measure of control , 2005, 2005 IEEE Congress on Evolutionary Computation.

[4]  Oliver Brock,et al.  State Representation Learning in Robotics: Using Prior Knowledge about Physical Interaction , 2014, Robotics: Science and Systems.

[5]  Martin A. Riedmiller,et al.  Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.

[6]  Robert Babuska,et al.  Learning state representation for deep actor-critic control , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[7]  Vivienne Sze,et al.  Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.

[8]  Shimon Whiteson,et al.  Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[9]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[10]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[11]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[12]  Sergey Levine,et al.  Learning Visual Feature Spaces for Robotic Manipulation with Deep Spatial Autoencoders , 2015, ArXiv.

[13]  Terrence J. Sejnowski,et al.  Slow Feature Analysis: Unsupervised Learning of Invariances , 2002, Neural Computation.

[14]  Oliver Brock,et al.  Learning state representations with robotic priors , 2015, Auton. Robots.

[15]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[16]  Martin A. Riedmiller Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[17]  Martin A. Riedmiller,et al.  Autonomous reinforcement learning on raw visual input data in a real world application , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[18]  David Filliat,et al.  State Representation Learning for Control: An Overview , 2018, Neural Networks.