论文信息 - Efficient Bimanual Manipulation Using Learned Task Schemas

Efficient Bimanual Manipulation Using Learned Task Schemas

We address the problem of effectively composing skills to solve sparse-reward tasks in the real world. Given a set of parameterized skills (such as exerting a force or doing a top grasp at a location), our goal is to learn policies that invoke these skills to efficiently solve such tasks. Our insight is that for many tasks, the learning process can be decomposed into learning a state-independent task schema (a sequence of skills to execute) and a policy to choose the parameterizations of the skills in a state-dependent manner. For such tasks, we show that explicitly modeling the schema’s state-independence can yield significant improvements in sample efficiency for model-free reinforcement learning algorithms. Furthermore, these schemas can be transferred to solve related tasks, by simply re-learning the parameterizations with which the skills are invoked. We find that doing so enables learning to solve sparse-reward tasks on real-world robotic systems very efficiently. We validate our approach experimentally over a suite of robotic bimanual manipulation tasks, both in simulation and on real hardware. See videos at http://tinyurl.com/chitnis-schema.

Abhinav Gupta | Shubham Tulsiani | Rohan Chitnis | Saurabh Gupta

[1] Pieter Abbeel,et al. Meta Learning Shared Hierarchies , 2017, ICLR.

[2] Pravesh Ranchod,et al. Reinforcement Learning with Parameterized Actions , 2015, AAAI.

[3] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[4] Sergey Levine,et al. Data-Efficient Hierarchical Reinforcement Learning , 2018, NeurIPS.

[5] Sergey Levine,et al. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[6] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[7] Oliver Kroemer,et al. Towards learning hierarchical skills for multi-phase manipulation tasks , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[8] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[9] Peter Stone,et al. Deep Reinforcement Learning in Parameterized Action Space , 2015, ICLR.

[10] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[11] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[12] Wojciech Zaremba,et al. Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[13] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[14] L. Chrisman. Reasoning About Probabilistic Actions At Multiple Levels of Granularity , 1994 .

[15] Drew Wicke,et al. Hierarchical Approaches for Reinforcement Learning in Parameterized Action Space , 2018, AAAI Spring Symposia.

[16] Trevor Darrell,et al. Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17] Ping Hsu. Coordinated control of multiple manipulator systems , 1993, IEEE Trans. Robotics Autom..

[18] Sergey Levine,et al. Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[19] Dylan Hadfield-Menell,et al. Guided search for task and motion plans using learned heuristics , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[20] Leslie Pack Kaelbling,et al. Learning to guide task and motion planning using score-space representation , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[21] Sergey Levine,et al. (CAD)$^2$RL: Real Single-Image Flight without a Single Real Image , 2016, Robotics: Science and Systems.

[22] Dan Klein,et al. Modular Multitask Reinforcement Learning with Policy Sketches , 2016, ICML.

[23] Gregory D. Hager,et al. Combining neural networks and tree search for task and motion planning in challenging environments , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[24] Minoru Asada,et al. Purposive Behavior Acquisition for a Real Robot by Vision-Based Reinforcement Learning , 2005, Machine Learning.

[25] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[26] Erann Gat. On the Role of Simulation in the Study of Autonomous Mobile Robots , 2002 .

[27] Emanuel Todorov,et al. Ensemble-CIO: Full-body dynamic motion planning that transfers to physical humanoids , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[28] Tamim Asfour,et al. Programming by demonstration: dual-arm manipulation tasks for humanoid robots , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[29] Silvio Savarese,et al. SURREAL: Open-Source Reinforcement Learning Framework and Robot Manipulation Benchmark , 2018, CoRL.

[30] Milos Hauskrecht,et al. Hierarchical Solution of Markov Decision Processes using Macro-actions , 1998, UAI.

[31] Danica Kragic,et al. Dual arm manipulation - A survey , 2012, Robotics Auton. Syst..

[32] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.

[33] Tzyh Jong Tarn,et al. Intelligent planning and control for multirobot coordination: An event-based approach , 1996, IEEE Trans. Robotics Autom..

[34] Sergey Levine,et al. Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[35] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[36] Abhinav Gupta,et al. PyRobot: An Open-source Robotics Framework for Research and Benchmarking , 2019, ArXiv.

[37] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[38] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[39] Leslie Pack Kaelbling,et al. Active Model Learning and Diverse Action Sampling for Task and Motion Planning , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[40] Yuval Davidor,et al. Genetic Algorithms and Robotics - A Heuristic Strategy for Optimization , 1991, World Scientific Series in Robotics and Intelligent Systems.

[41] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[42] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[43] Aude Billard,et al. Combining Dynamical Systems control and programming by demonstration for teaching discrete bimanual coordination tasks to a humanoid robot , 2008, 2008 3rd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[44] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.