Solving the Real Robot Challenge using Deep Reinforcement Learning

This paper details our winning submission to Phase 1 of the 2021 Real Robot Challenge; a challenge in which a three fingered robot must carry a cube along specified goal trajectories. To solve Phase 1, we use a pure reinforcement learning approach which requires minimal expert knowledge of the robotic system or of robotic grasping in general. A sparse, goal-based reward is employed in conjunction with Hindsight Experience Replay to teach the control policy to move the cube to the desired x and y coordinates. Simultaneously, a dense distance-based reward is employed to teach the policy to lift the cube to the desired z coordinate. The policy is trained in simulation with domain randomisation before being transferred to the real robot for evaluation. Although performance tends to worsen after this transfer, our best trained policy can successfully lift the real cube along goal trajectories via the use of an effective pinching grasp. Our approach outperforms all other submissions, including those leveraging more traditional robotic control techniques, and is the first learning-based approach to solve this challenge.

[1]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[2]  Stefan Schaal,et al.  Learning and generalization of motor skills by learning from demonstration , 2009, 2009 IEEE International Conference on Robotics and Automation.

[3]  Raúl Suárez,et al.  Manipulation of Unknown Objects to Improve the Grasp Quality Using Tactile Information , 2018, Sensors.

[4]  Hui Wei,et al.  Robotic arm controlling based on a spiking neural circuit and synaptic plasticity , 2020, Biomed. Signal Process. Control..

[5]  Sergey Levine,et al.  When to Trust Your Model: Model-Based Policy Optimization , 2019, NeurIPS.

[6]  Edward Johns,et al.  Coarse-to-Fine Imitation Learning: Robot Manipulation from a Single Demonstration , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[7]  Leslie Pack Kaelbling,et al.  Residual Policy Learning , 2018, ArXiv.

[8]  Siddhartha S. Srinivasa,et al.  Benchmarking Structured Policies and Policy Optimization for Real-World Dexterous Object Manipulation , 2021, IEEE Robotics and Automation Letters.

[9]  Ville Kyrki,et al.  Meta Reinforcement Learning for Sim-to-real Domain Adaptation , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[10]  Sergey Levine,et al.  Deep Dynamics Models for Learning Dexterous Manipulation , 2019, CoRL.

[11]  Ludovic Righetti,et al.  TriFinger: An Open-Source Robot for Learning Dexterity , 2020, CoRL.

[12]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[13]  Ruslan Salakhutdinov,et al.  Off-Dynamics Reinforcement Learning: Training for Transfer with Domain Classifiers , 2020, ArXiv.

[14]  S. Levine,et al.  Accelerating Online Reinforcement Learning with Offline Datasets , 2020, ArXiv.

[15]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[16]  Mohammad Norouzi,et al.  Dream to Control: Learning Behaviors by Latent Imagination , 2019, ICLR.

[17]  Stephen J. Redmond,et al.  Imaginary Hindsight Experience Replay: Curious Model-based Learning for Sparse Reward Tasks , 2021, ArXiv.

[18]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[19]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[20]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[21]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[22]  Birgitta Dresp-Langley,et al.  Deep Reinforcement Learning for the Control of Robotic Manipulation: A Focussed Mini-Review , 2021, Robotics.

[23]  S. Levine,et al.  Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems , 2020, ArXiv.

[24]  Marcin Andrychowicz,et al.  Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[25]  Maxim Likhachev,et al.  Search-based planning for manipulation with motion primitives , 2010, 2010 IEEE International Conference on Robotics and Automation.

[26]  Stefan Schaal,et al.  Learning motion primitive goals for robust manipulation , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[27]  Marcin Andrychowicz,et al.  Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.

[28]  S. LaValle Rapidly-exploring random trees : a new tool for path planning , 1998 .

[29]  Matthew R. Walter,et al.  Grasp and Motion Planning for Dexterous Manipulation for the Real Robot Challenge , 2021, ArXiv.

[30]  Nicola Castaman,et al.  Combining visual and force feedback for the precise robotic manipulation of bulky components , 2021, Optical Metrology.

[31]  Marcin Andrychowicz,et al.  Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research , 2018, ArXiv.

[32]  Martin A. Riedmiller,et al.  Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards , 2017, ArXiv.

[33]  Siddhartha S. Srinivasa,et al.  A Robot Cluster for Reproducible Research in Dexterous Manipulation , 2021, ArXiv.