论文信息 - Autonomous Curriculum Generation for Self-Learning Agents

Autonomous Curriculum Generation for Self-Learning Agents

The applicability of deep reinforcement learning algorithms to the domain of robotics is limited by the issue of sample inefficiency. As in most machine learning methods, more samples generally mean better learning effectiveness. Sample collection for robotics application is a time-consuming process in addition to safety issues for both the robot itself and the environment surrounding it that come into play for real-world scenarios. Because of these limitations, sample efficiency plays a very vital role in the field of robotic learning. To deal with this, curriculum learning offers a methodology that allows robots to suffer less from the sample collection burden required, trying to keep it at a minimum. This study aims to tackle the sample inefficiency that deep reinforcement learning algorithms face in the domain of robotics by designing a curriculum. We propose an algorithm which decides on the sequence of tasks that the agent must learn to enable the transfer of knowledge in a sample-efficient manner towards the target task. Our algorithm performs a parameter-space task representation for the purpose of deciding on the difficultiness of the tasks. Once the difficulty level of each is determined, easy tasks are learned first before the final target task. We perform a study on a double inverted pendulum setup. Simulation results showed that transfer of knowledge via curriculum is more sample efficient than a direct transfer.

[1] Sergey Levine,et al. QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.

[2] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[3] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[4] Alberto Rodriguez,et al. Learning Synergies Between Pushing and Grasping with Self-Supervised Deep Reinforcement Learning , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[5] Peter Stone,et al. Learning Curriculum Policies for Reinforcement Learning , 2018, AAMAS.

[6] Yuval Tassa,et al. Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.

[7] Jitendra Malik,et al. Mid-Level Visual Representations Improve Generalization and Sample Efficiency for Learning Visuomotor Policies , 2018 .

[8] Konstantinos Makantasis,et al. A Deep Reinforcement-Learning-based Driving Policy for Autonomous Road Vehicles , 2019, IET Intelligent Transport Systems.

[9] Michel Tokic. Adaptive ε-greedy Exploration in Reinforcement Learning Based on Value Differences , 2010 .

[10] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[11] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[12] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[13] Sergey Levine,et al. Learning Invariant Feature Spaces to Transfer Skills with Reinforcement Learning , 2017, ICLR.

[14] Keng Peng Tee,et al. Towards Emergence of Tool Use in Robots: Automatic Tool Recognition and Use Without Prior Tool Learning , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[15] Mohamed Chetouani,et al. CLIC: Curriculum Learning and Imitation for object Control in non-rewarding environments , 2019 .

[16] Peter Stone,et al. Source Task Creation for Curriculum Learning , 2016, AAMAS.

[17] Siyuan Chen,et al. Agent Embeddings: A Latent Representation for Pole-Balancing Networks , 2019, AAMAS.

[18] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[19] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.

[20] Kate Saenko,et al. Grasp Pose Detection in Point Clouds , 2017, Int. J. Robotics Res..

[21] Anil A. Bharath,et al. Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.

[22] Sylvain Calinon,et al. A Survey on Policy Search Algorithms for Learning Robot Controllers in a Handful of Trials , 2018, IEEE Transactions on Robotics.

[23] Peter Stone,et al. Autonomous Task Sequencing for Customized Curriculum Design in Reinforcement Learning , 2017, IJCAI.

[24] Sergey Levine,et al. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..