论文信息 - Dynamics-aware Embeddings

Dynamics-aware Embeddings

In this paper we consider self-supervised representation learning to improve sample efficiency in reinforcement learning (RL). We propose a forward prediction objective for simultaneously learning embeddings of states and action sequences. These embeddings capture the structure of the environment's dynamics, enabling efficient policy learning. We demonstrate that our action embeddings alone improve the sample efficiency and peak performance of model-free RL on control from low-dimensional states. By combining state and action embeddings, we achieve efficient learning of high-quality policies on goal-conditioned continuous control from pixel observations in only 1-2 million environment steps.

[1] Sergey Levine,et al. QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.

[2] Martin A. Riedmiller,et al. Deep learning of visual control policies , 2010, ESANN.

[3] Sergey Levine,et al. EMI: Exploration with Mutual Information Maximizing State and Action Embeddings , 2018, ArXiv.

[4] Samuel Gershman,et al. Deep Successor Reinforcement Learning , 2016, ArXiv.

[5] Richard S. Sutton,et al. Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.

[6] Sergey Levine,et al. Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings , 2018, ICML.

[7] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[8] Naftali Tishby,et al. The information bottleneck method , 2000, ArXiv.

[9] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[10] Doina Precup,et al. The Option-Critic Architecture , 2016, AAAI.

[11] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[12] Tamim Asfour,et al. Model-Based Reinforcement Learning via Meta-Policy Optimization , 2018, CoRL.

[13] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[14] Ruben Villegas,et al. Learning Latent Dynamics for Planning from Pixels , 2018, ICML.

[15] Sergey Levine,et al. Latent Space Policies for Hierarchical Reinforcement Learning , 2018, ICML.

[16] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.

[17] M. Botvinick,et al. The hippocampus as a predictive map , 2016 .

[18] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[19] Daan Wierstra,et al. Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[20] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.

[21] M. Botvinick,et al. The successor representation in human reinforcement learning , 2016, Nature Human Behaviour.

[22] Pieter Abbeel,et al. Stochastic Neural Networks for Hierarchical Reinforcement Learning , 2016, ICLR.

[23] Karol Hausman,et al. Learning an Embedding Space for Transferable Robot Skills , 2018, ICLR.

[24] Yuval Tassa,et al. DeepMind Control Suite , 2018, ArXiv.

[25] Jan Peters,et al. Stable reinforcement learning with autoencoders for tactile and visual data , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).