Bayesian Meta-Learning for Few-Shot Policy Adaptation Across Robotic Platforms

Reinforcement learning methods can achieve significant performance but require a large amount of training data collected on the same robotic platform. A policy trained with expensive data is rendered useless after making even a minor change to the robot hardware. In this paper, we address the challenging problem of adapting a policy, trained to perform a task, to a novel robotic hardware platform given only few demonstrations of robot motion trajectories on the target robot. We formulate it as a few-shot meta-learning problem where the goal is to find a meta-model that captures the common structure shared across different robotic platforms such that data-efficient adaptation can be performed. We achieve such adaptation by introducing a learning framework consisting of a probabilistic gradient-based meta-learning algorithm that models the uncertainty arising from the few-shot setting with a low-dimensional latent variable. We experimentally evaluate our framework on a simulated reaching and a real-robot picking task using 400 simulated robots generated by varying the physical parameters of an existing set of robotic platforms. Our results show that the proposed method can successfully adapt a trained policy to different robotic platforms with novel physical parameters and the superiority of our meta-learning algorithm compared to state-of-the-art methods for the introduced few-shot policy adaptation problem.

[1]  Jakub W. Pachocki,et al.  Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..

[2]  Ville Kyrki,et al.  Meta Reinforcement Learning for Sim-to-real Domain Adaptation , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[3]  Andrew J. Davison,et al.  Task-Embedded Control Networks for Few-Shot Imitation Learning , 2018, CoRL.

[4]  Sergey Levine,et al.  Sim2Real View Invariant Visual Servoing by Recurrent Control , 2017, ArXiv.

[5]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[6]  Ville Kyrki,et al.  Few-Shot Model-Based Adaptation in Noisy Conditions , 2020, IEEE Robotics and Automation Letters.

[7]  Danica Kragic,et al.  Deep predictive policy training using reinforcement learning , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[8]  Mubarak Shah,et al.  Task Agnostic Meta-Learning for Few-Shot Learning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Abhinav Gupta,et al.  Learning to push by grasping: Using multiple tasks for effective learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[10]  Sergey Levine,et al.  One-Shot Visual Imitation Learning via Meta-Learning , 2017, CoRL.

[11]  Marc Peter Deisenroth,et al.  Probabilistic Active Meta-Learning , 2020, NeurIPS.

[12]  Patric Jensfelt,et al.  Adversarial Feature Training for Generalizable Robotic Visuomotor Control , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[13]  Alex Beatson,et al.  Amortized Bayesian Meta-Learning , 2018, ICLR.

[14]  Sebastian Nowozin,et al.  Meta-Learning Probabilistic Inference for Prediction , 2018, ICLR.

[15]  Yoshua Bengio,et al.  Bayesian Model-Agnostic Meta-Learning , 2018, NeurIPS.

[16]  Danica Kragic,et al.  Imitating by Generating: Deep Generative Models for Imitation of Interactive Tasks , 2019, Frontiers in Robotics and AI.

[17]  Tao Chen,et al.  Hardware Conditioned Policies for Multi-Robot Transfer Learning , 2018, NeurIPS.

[18]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[19]  Gaurav S. Sukhatme,et al.  Efficient Adaptation for End-to-End Vision-Based Robotic Manipulation , 2020, RSS 2020.

[20]  Pieter Abbeel,et al.  Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments , 2017, ICLR.

[21]  Ville Kyrki,et al.  Affordance Learning for End-to-End Visuomotor Robot Control , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[22]  Sergey Levine,et al.  RoboNet: Large-Scale Multi-Robot Learning , 2019, CoRL.

[23]  Yayi Zou,et al.  Gradient-EM Bayesian Meta-learning , 2020, NeurIPS.

[24]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[25]  Sergey Levine,et al.  Probabilistic Model-Agnostic Meta-Learning , 2018, NeurIPS.

[26]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[27]  Wenlong Huang,et al.  One Policy to Control Them All: Shared Modular Policies for Agent-Agnostic Control , 2020, ICML.

[28]  Matthew R. Walter,et al.  Jointly Learning to Construct and Control Agents using Deep Reinforcement Learning , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[29]  Sergey Levine,et al.  Meta-Reinforcement Learning of Structured Exploration Strategies , 2018, NeurIPS.

[30]  Chelsea Finn,et al.  Rapidly Adaptable Legged Robots via Evolutionary Meta-Learning , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[31]  Tao Xiang,et al.  Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Razvan Pascanu,et al.  Meta-Learning with Latent Embedding Optimization , 2018, ICLR.

[33]  Sergey Levine,et al.  Learning modular neural network policies for multi-task and multi-robot transfer , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[34]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[35]  Sergey Levine,et al.  One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning , 2018, Robotics: Science and Systems.

[36]  Emilio Frazzoli,et al.  Incremental Sampling-based Algorithms for Optimal Motion Planning , 2010, Robotics: Science and Systems.

[37]  Karol Hausman,et al.  Learning an Embedding Space for Transferable Robot Skills , 2018, ICLR.

[38]  Thomas L. Griffiths,et al.  Recasting Gradient-Based Meta-Learning as Hierarchical Bayes , 2018, ICLR.

[39]  Danica Kragic,et al.  Data-efficient visuomotor policy training using reinforcement learning and generative models , 2020, ArXiv.

[40]  Marcin Andrychowicz,et al.  Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[41]  Marco Pavone,et al.  Meta-Learning Priors for Efficient Online Bayesian Regression , 2018, WAFR.

[42]  Andrew J. Davison,et al.  Learning One-Shot Imitation From Humans Without Humans , 2019, IEEE Robotics and Automation Letters.

[43]  Sergey Levine,et al.  Deep visual foresight for planning robot motion , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[44]  Marcin Andrychowicz,et al.  One-Shot Imitation Learning , 2017, NIPS.

[45]  Sergey Levine,et al.  Learning to Adapt in Dynamic, Real-World Environments through Meta-Reinforcement Learning , 2018, ICLR.

[46]  Chelsea Finn,et al.  Deep Reinforcement Learning amidst Lifelong Non-Stationarity , 2020, ArXiv.