论文信息 - Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research

Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research

The purpose of this technical report is two-fold. First of all, it introduces a suite of challenging continuous control tasks (integrated with OpenAI Gym) based on currently existing robotics hardware. The tasks include pushing, sliding and pick & place with a Fetch robotic arm as well as in-hand object manipulation with a Shadow Dexterous Hand. All tasks have sparse binary rewards and follow a Multi-Goal Reinforcement Learning (RL) framework in which an agent is told what to do using an additional input. The second part of the paper presents a set of concrete research ideas for improving RL algorithms, most of which are related to Multi-Goal RL and Hindsight Experience Replay.

[1] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.

[2] Pieter Abbeel,et al. Equivalence Between Policy Gradients and Soft Q-Learning , 2017, ArXiv.

[3] Sergey Levine,et al. Temporal Difference Models: Model-Free Deep RL for Model-Based Control , 2018, ICLR.

[4] Yang Liu,et al. Learning to Play in a Day: Faster Deep Reinforcement Learning by Optimality Tightening , 2016, ICLR.

[5] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[6] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[7] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[8] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[9] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.

[10] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.

[11] Richard E. Turner,et al. Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning , 2017, NIPS.

[12] Kate Saenko,et al. Hierarchical Actor-Critic , 2017, ArXiv.

[13] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.

[14] Tom Schaul,et al. Universal Value Function Approximators , 2015, ICML.

[15] Pieter Abbeel,et al. Reverse Curriculum Generation for Reinforcement Learning , 2017, CoRL.

[16] Marcin Andrychowicz,et al. Parameter Space Noise for Exploration , 2017, ICLR.

[17] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18] Filipe Wall Mutz,et al. Hindsight policy gradients , 2017, ICLR.

[19] Marcin Andrychowicz,et al. Hindsight Experience Replay , 2017, NIPS.