论文信息 - Discovering Generalizable Skills via Automated Generation of Diverse Tasks

Discovering Generalizable Skills via Automated Generation of Diverse Tasks

The learning efficiency and generalization ability of an intelligent agent can be greatly improved by utilizing a useful set of skills. However, the design of robot skills can often be intractable in real-world applications due to the prohibitive amount of effort and expertise that it requires. In this work, we introduce Skill Learning In Diversified Environments (SLIDE), a method to discover generalizable skills via automated generation of a diverse set of tasks. As opposed to prior work on unsupervised discovery of skills which incentivizes the skills to produce different outcomes in the same environment, our method pairs each skill with a unique task produced by a trainable task generator. To encourage generalizable skills to emerge, our method trains each skill to specialize in the paired task and maximizes the diversity of the generated tasks. A task discriminator defined on the robot behaviors in the generated tasks is jointly trained to estimate the evidence lower bound of the diversity objective. The learned skills can then be composed in a hierarchical reinforcement learning algorithm to solve unseen target tasks. We demonstrate that the proposed method can effectively learn a variety of robot skills in two tabletop manipulation domains. Our results suggest that the learned skills can effectively improve the robot’s performance in various unseen target tasks compared to existing reinforcement learning and skill learning methods.

[1] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[2] Marc Toussaint,et al. Differentiable Physics and Stable Modes for Tool-Use and Manipulation Planning , 2018, Robotics: Science and Systems.

[3] Pierre-Yves Oudeyer,et al. Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning , 2017, J. Mach. Learn. Res..

[4] Sergey Levine,et al. Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning , 2019, CoRL.

[5] Andy Zeng,et al. Form2Fit: Learning Shape Priors for Generalizable Assembly from Disassembly , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[6] Siyuan Li,et al. Context-Aware Policy Reuse , 2018, AAMAS.

[7] Pieter Abbeel,et al. Meta Learning Shared Hierarchies , 2017, ICLR.

[8] Sergey Levine,et al. Deep visual foresight for planning robot motion , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[9] Julian Togelius,et al. Procedural Content Generation through Quality Diversity , 2019, 2019 IEEE Conference on Games (CoG).

[10] Yuval Tassa,et al. Data-efficient Deep Reinforcement Learning for Dexterous Manipulation , 2017, ArXiv.

[11] Kenneth O. Stanley,et al. POET: open-ended coevolution of environments and their optimized solutions , 2019, GECCO.

[12] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[13] Matthew T. Mason,et al. Mechanics and Planning of Manipulator Pushing Operations , 1986 .

[14] Julian Togelius,et al. Fully Differentiable Procedural Content Generation through Generative Playing Networks , 2020, ArXiv.

[15] Alberto Rodriguez,et al. TossingBot: Learning to Throw Arbitrary Objects With Residual Physics , 2019, IEEE Transactions on Robotics.

[16] Siyuan Li,et al. An Optimal Online Method of Selecting Source Policies for Reinforcement Learning , 2017, AAAI.

[17] Jun Nakanishi,et al. Dynamical Movement Primitives: Learning Attractor Models for Motor Behaviors , 2013, Neural Computation.

[18] Manuela M. Veloso,et al. Probabilistic policy reuse in a reinforcement learning agent , 2006, AAMAS '06.

[19] Marcin Andrychowicz,et al. Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.

[20] Joshua B. Tenenbaum,et al. Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[21] Pieter Abbeel,et al. Automatic Goal Generation for Reinforcement Learning Agents , 2017, ICML.

[22] Zoran Popovic,et al. Discovery of complex behaviors through contact-invariant optimization , 2012, ACM Trans. Graph..

[23] Sehoon Ha,et al. Expanding Motor Skills using Relay Networks , 2018, CoRL.

[24] Andrew K. Lampinen,et al. Automated curriculum generation through setter-solver interactions , 2020, ICLR.

[25] Silvio Savarese,et al. Learning task-oriented grasping for tool manipulation from simulated self-supervision , 2018, Robotics: Science and Systems.

[26] Xinyu Liu,et al. Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics , 2017, Robotics: Science and Systems.

[27] Oliver Kroemer,et al. Learning sequential motor tasks , 2013, 2013 IEEE International Conference on Robotics and Automation.

[28] Roozbeh Mottaghi,et al. Rearrangement: A Challenge for Embodied AI , 2020, ArXiv.

[29] S. Schaal. Dynamic Movement Primitives -A Framework for Motor Control in Humans and Humanoid Robotics , 2006 .

[30] Andrew J. Davison,et al. RLBench: The Robot Learning Benchmark & Learning Environment , 2019, IEEE Robotics and Automation Letters.

[31] John Schulman,et al. Teacher–Student Curriculum Learning , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[32] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[33] MahadevanSridhar,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003 .

[34] Andrew G. Barto,et al. Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining , 2009, NIPS.

[35] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[36] Julian Togelius,et al. PCGRL: Procedural Content Generation via Reinforcement Learning , 2020, AAAI 2020.

[37] Philippe Beaudoin,et al. Disentangling the independently controllable factors of variation by interacting with the world , 2018, ArXiv.

[38] Sergey Levine,et al. Unsupervised Meta-Learning for Reinforcement Learning , 2018, ArXiv.

[39] George Konidaris,et al. Option Discovery using Deep Skill Chaining , 2020, ICLR.

[40] Joseph J. Lim,et al. Accelerating Reinforcement Learning with Learned Skill Priors , 2020, CoRL.

[41] Sebastian Thrun,et al. Finding Structure in Reinforcement Learning , 1994, NIPS.

[42] Silvio Savarese,et al. Adaptive Procedural Task Generation for Hard-Exploration Problems , 2021, ICLR.

[43] Pieter Abbeel,et al. Stochastic Neural Networks for Hierarchical Reinforcement Learning , 2016, ICLR.

[44] Stefan Schaal,et al. Reinforcement Learning With Sequences of Motion Primitives for Robust Manipulation , 2012, IEEE Transactions on Robotics.

[45] J. Schulman,et al. Leveraging Procedural Generation to Benchmark Reinforcement Learning , 2019, ICML.

[46] Silvio Savarese,et al. Neural Task Programming: Learning to Generalize Across Hierarchical Tasks , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[47] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[48] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[49] Dan Klein,et al. Modular Multitask Reinforcement Learning with Policy Sketches , 2016, ICML.

[50] Daan Wierstra,et al. Variational Intrinsic Control , 2016, ICLR.

[51] Leslie Pack Kaelbling,et al. Hierarchical task and motion planning in the now , 2011, 2011 IEEE International Conference on Robotics and Automation.

[52] Richard Fikes,et al. STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.

[53] Pierre-Yves Oudeyer,et al. Teacher algorithms for curriculum learning of Deep RL in continuously parameterized environments , 2019, CoRL.

[54] Yuke Zhu,et al. Hierarchical Planning for Long-Horizon Manipulation with Geometric and Symbolic Scene Graphs , 2020, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[55] Jan Peters,et al. Hierarchical Relative Entropy Policy Search , 2014, AISTATS.

[56] Stuart J. Russell,et al. Combined Task and Motion Planning for Mobile Manipulation , 2010, ICAPS.

[57] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.

[58] Silvio Savarese,et al. Neural Task Graphs: Generalizing to Unseen Tasks From a Single Video Demonstration , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[59] Silvio Savarese,et al. AC-Teach: A Bayesian Actor-Critic Method for Policy Learning with an Ensemble of Suboptimal Teachers , 2019, CoRL.

[60] Joseph J. Lim,et al. Learning to Coordinate Manipulation Skills via Skill Behavior Diversification , 2020, ICLR.

[61] Ali Farhadi,et al. AI2-THOR: An Interactive 3D Environment for Visual AI , 2017, ArXiv.

[62] Silvio Savarese,et al. Mechanical Search: Multi-Step Retrieval of a Target Object Occluded by Clutter , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[63] Silvio Savarese,et al. Visuomotor Mechanical Search: Learning to Retrieve Target Objects in Clutter , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[64] Doina Precup,et al. The Option-Critic Architecture , 2016, AAAI.

[65] A. Ruina,et al. Planar sliding with dry friction Part 1. Limit surface and moment function , 1991 .

[66] David Warde-Farley,et al. Unsupervised Control Through Non-Parametric Discriminative Rewards , 2018, ICLR.

[67] Sergey Levine,et al. Dynamics-Aware Unsupervised Discovery of Skills , 2019, ICLR.

[68] Doina Precup,et al. Temporal abstraction in reinforcement learning , 2000, ICML 2000.

[69] Qiang Liu,et al. Learning Self-Imitating Diverse Policies , 2018, ICLR.

[70] Jeannette Bohg,et al. Leveraging big data for grasp planning , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[71] Jitendra Malik,et al. Gibson Env: Real-World Perception for Embodied Agents , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[72] Christopher G. Atkeson,et al. Learning from observation using primitives , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[73] Pieter Abbeel,et al. Combined task and motion planning through an extensible planner-independent interface layer , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[74] Peter Englert,et al. Multi-task policy search for robotics , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[75] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[76] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[77] Richard Socher,et al. Explore, Discover and Learn: Unsupervised Discovery of State-Covering Skills , 2020, ICML.

[78] Allan Jabri,et al. Unsupervised Curricula for Visual Meta-Reinforcement Learning , 2019, NeurIPS.

[79] Jeannette Bohg,et al. Concept2Robot: Learning manipulation concepts from instructions and human demonstrations , 2020, Robotics: Science and Systems.

[80] Danica Kragic,et al. Data-Driven Grasp Synthesis—A Survey , 2013, IEEE Transactions on Robotics.

[81] Silvio Savarese,et al. Dynamics Learning with Cascaded Variational Inference for Multi-Step Manipulation , 2019, CoRL.

[82] Herke van Hoof,et al. A Performance-Based Start State Curriculum Framework for Reinforcement Learning , 2020, AAMAS.

[83] Jitendra Malik,et al. Habitat: A Platform for Embodied AI Research , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[84] Yuval Tassa,et al. Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.

[85] Alberto Rodriguez,et al. Learning Synergies Between Pushing and Grasping with Self-Supervised Deep Reinforcement Learning , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[86] Sergey Levine,et al. Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.

[87] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[88] Julian Togelius,et al. Procedural Content Generation via Machine Learning (PCGML) , 2017, IEEE Transactions on Games.