ReLMM: Practical RL for Learning Mobile Manipulation Skills Using Only Onboard Sensors

In this paper, we study how robots can autonomously learn skills that require a combination of navigation and grasping. Learning robotic skills in the real world remains challenging without large scale data collection and supervision. Our aim is to devise a robotic reinforcement learning system for learning navigation and manipulation together, in an autonomous way without human intervention, enabling continual learning under realistic assumptions. Specifically, our system, ReLMM, can learn continuously on a real-world platform without any environment instrumentation, without human intervention, and without access to privileged information, such as maps, objects positions, or a global view of the environment. Our method employs a modularized policy with components for manipulation and navigation, where uncertainty over the manipulation success drives exploration for the navigation controller, and the manipulation module provides rewards for navigation. We evaluate our method on a room cleanup task, where the robot must navigate to and pick up items of scattered on the floor. After a grasp curriculum training phase, ReLMM can learn navigation and grasping together fully automatically, in around 40 hours of real-world training.

[1]  Sergey Levine,et al.  QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.

[2]  Sergey Levine,et al.  Path integral guided policy search , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[3]  James J. Kuffner,et al.  Navigation among movable obstacles: real-time reasoning in complex environments , 2004, 4th IEEE/RAS International Conference on Humanoid Robots, 2004..

[4]  Michael Beetz,et al.  Learning and performing place-based mobile manipulation , 2009, 2009 IEEE 8th International Conference on Development and Learning.

[5]  Stephanie Rosenthal,et al.  Learning Accuracy and Availability of Humans Who Help Mobile Robots , 2011, AAAI.

[6]  Sebastian Scherer,et al.  Flying Fast and Low Among Obstacles: Methodology and Experiments , 2008, Int. J. Robotics Res..

[7]  Sergey Levine,et al.  Self-Supervised Deep Reinforcement Learning with Generalized Computation Graphs for Robot Navigation , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[8]  Kate Saenko,et al.  Grasp Pose Detection in Point Clouds , 2017, Int. J. Robotics Res..

[9]  Silvio Savarese,et al.  ReLMoGen: Leveraging Motion Generation in Reinforcement Learning for Mobile Manipulation , 2020, ArXiv.

[10]  Armin Wedler,et al.  Mobile manipulation for planetary exploration , 2018, 2018 IEEE Aerospace Conference.

[11]  Szymon Rusinkiewicz,et al.  Spatial Action Maps for Mobile Manipulation , 2020, Robotics: Science and Systems.

[12]  Wolfram Burgard,et al.  Learning mobile manipulation actions from human demonstrations , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[13]  Abhinav Gupta,et al.  Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Sergey Levine,et al.  Learning to Walk via Deep Reinforcement Learning , 2018, Robotics: Science and Systems.

[15]  Sen Wang,et al.  Learning Mobile Manipulation through Deep Reinforcement Learning , 2020, Sensors.

[16]  Marc Hanheide,et al.  Persistent localization and life-long mapping in changing environments using the Frequency Map Enhancement , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[17]  Michael I. Jordan,et al.  Is Q-learning Provably Efficient? , 2018, NeurIPS.

[18]  Michael Milford,et al.  Learning Deployable Navigation Policies at Kilometer Scale from a Single Traversal , 2018, CoRL.

[19]  Abhinav Gupta,et al.  PyRobot: An Open-source Robotics Framework for Research and Benchmarking , 2019, ArXiv.

[20]  Manuela M. Veloso,et al.  Localization and navigation of the CoBots over long-term deployments , 2013, Int. J. Robotics Res..

[21]  Sergey Levine,et al.  Ecological Reinforcement Learning , 2020, ArXiv.

[22]  T. Suehiro,et al.  Development of Outdoor Service Robots , 2006, 2006 SICE-ICASE International Joint Conference.

[23]  Carlos R. del-Blanco,et al.  DroNet: Learning to Fly by Driving , 2018, IEEE Robotics and Automation Letters.

[24]  Sergey Levine,et al.  Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..

[25]  Xinyu Liu,et al.  Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics , 2017, Robotics: Science and Systems.

[26]  Gaurav S. Sukhatme,et al.  Combining Model-Based and Model-Free Updates for Trajectory-Centric Reinforcement Learning , 2017, ICML.

[27]  Pieter Abbeel,et al.  Learning to Manipulate Deformable Objects without Demonstrations , 2019, Robotics: Science and Systems.

[28]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[29]  Matthew R. Walter,et al.  One-shot visual appearance learning for mobile manipulation , 2012, Int. J. Robotics Res..

[30]  Peter Stone,et al.  Structure-based color learning on a mobile robot under changing illumination , 2007, Auton. Robots.

[31]  Pieter Abbeel,et al.  BADGR: An Autonomous Self-Supervised Learning-Based Navigation System , 2020, ArXiv.

[32]  Wolfram Burgard,et al.  Monte Carlo Localization: Efficient Position Estimation for Mobile Robots , 1999, AAAI/IAAI.

[33]  Abhinav Gupta,et al.  Robot Learning in Homes: Improving Generalization and Reducing Dataset Bias , 2018, NeurIPS.

[34]  Honglak Lee,et al.  Deep learning for detecting robotic grasps , 2013, Int. J. Robotics Res..

[35]  Jitendra Malik,et al.  Learning to Poke by Poking: Experiential Learning of Intuitive Physics , 2016, NIPS.

[36]  Sergey Levine,et al.  The Ingredients of Real-World Robotic Reinforcement Learning , 2020, ICLR.

[37]  Moritz Tenorth,et al.  Combining analysis, imitation, and experience-based learning to acquire a concept of reachability in robot mobile manipulation , 2009, 2009 9th IEEE-RAS International Conference on Humanoid Robots.

[38]  Leslie Pack Kaelbling,et al.  Unifying perception, estimation and action for mobile manipulation via belief space planning , 2012, 2012 IEEE International Conference on Robotics and Automation.

[39]  Sergey Levine,et al.  Leave no Trace: Learning to Reset for Safe and Autonomous Reinforcement Learning , 2017, ICLR.

[40]  Hanumant Singh,et al.  AutOTranS: an Autonomous Open World Transportation System , 2018, ArXiv.

[41]  Sergey Levine,et al.  Deep Dynamics Models for Learning Dexterous Manipulation , 2019, CoRL.

[42]  Sergey Levine,et al.  Learning compound multi-step controllers under unknown dynamics , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[43]  Joelle Pineau,et al.  Towards robotic assistants in nursing homes: Challenges and results , 2003, Robotics Auton. Syst..

[44]  Wolfram Burgard,et al.  Robust Monte Carlo localization for mobile robots , 2001, Artif. Intell..

[45]  Danica Kragic,et al.  Deep predictive policy training using reinforcement learning , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[46]  David Filliat,et al.  Continual learning for robotics: Definition, framework, learning strategies, opportunities and challenges , 2020, Inf. Fusion.