论文信息 - Bayesian Meta-Learning for Few-Shot Policy Adaptation Across Robotic Platforms

Bayesian Meta-Learning for Few-Shot Policy Adaptation Across Robotic Platforms

Reinforcement learning methods can achieve significant performance but require a large amount of training data collected on the same robotic platform. A policy trained with expensive data is rendered useless after making even a minor change to the robot hardware. In this paper, we address the challenging problem of adapting a policy, trained to perform a task, to a novel robotic hardware platform given only few demonstrations of robot motion trajectories on the target robot. We formulate it as a few-shot meta-learning problem where the goal is to find a meta-model that captures the common structure shared across different robotic platforms such that data-efficient adaptation can be performed. We achieve such adaptation by introducing a learning framework consisting of a probabilistic gradient-based meta-learning algorithm that models the uncertainty arising from the few-shot setting with a low-dimensional latent variable. We experimentally evaluate our framework on a simulated reaching and a real-robot picking task using 400 simulated robots generated by varying the physical parameters of an existing set of robotic platforms. Our results show that the proposed method can successfully adapt a trained policy to different robotic platforms with novel physical parameters and the superiority of our meta-learning algorithm compared to state-of-the-art methods for the introduced few-shot policy adaptation problem.

[1] Jakub W. Pachocki,et al. Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..

[2] Ville Kyrki,et al. Meta Reinforcement Learning for Sim-to-real Domain Adaptation , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[3] Andrew J. Davison,et al. Task-Embedded Control Networks for Few-Shot Imitation Learning , 2018, CoRL.

[4] Sergey Levine,et al. Sim2Real View Invariant Visual Servoing by Recurrent Control , 2017, ArXiv.

[5] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[6] Ville Kyrki,et al. Few-Shot Model-Based Adaptation in Noisy Conditions , 2020, IEEE Robotics and Automation Letters.

[7] Danica Kragic,et al. Deep predictive policy training using reinforcement learning , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[8] Mubarak Shah,et al. Task Agnostic Meta-Learning for Few-Shot Learning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Abhinav Gupta,et al. Learning to push by grasping: Using multiple tasks for effective learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[10] Sergey Levine,et al. One-Shot Visual Imitation Learning via Meta-Learning , 2017, CoRL.

[11] Marc Peter Deisenroth,et al. Probabilistic Active Meta-Learning , 2020, NeurIPS.

[12] Patric Jensfelt,et al. Adversarial Feature Training for Generalizable Robotic Visuomotor Control , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[13] Alex Beatson,et al. Amortized Bayesian Meta-Learning , 2018, ICLR.

[14] Sebastian Nowozin,et al. Meta-Learning Probabilistic Inference for Prediction , 2018, ICLR.

[15] Yoshua Bengio,et al. Bayesian Model-Agnostic Meta-Learning , 2018, NeurIPS.

[16] Danica Kragic,et al. Imitating by Generating: Deep Generative Models for Imitation of Interactive Tasks , 2019, Frontiers in Robotics and AI.

[17] Tao Chen,et al. Hardware Conditioned Policies for Multi-Robot Transfer Learning , 2018, NeurIPS.

[18] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[19] Gaurav S. Sukhatme,et al. Efficient Adaptation for End-to-End Vision-Based Robotic Manipulation , 2020, RSS 2020.

[20] Pieter Abbeel,et al. Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments , 2017, ICLR.

[21] Ville Kyrki,et al. Affordance Learning for End-to-End Visuomotor Robot Control , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[22] Sergey Levine,et al. RoboNet: Large-Scale Multi-Robot Learning , 2019, CoRL.

[23] Yayi Zou,et al. Gradient-EM Bayesian Meta-learning , 2020, NeurIPS.

[24] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[25] Sergey Levine,et al. Probabilistic Model-Agnostic Meta-Learning , 2018, NeurIPS.

[26] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[27] Wenlong Huang,et al. One Policy to Control Them All: Shared Modular Policies for Agent-Agnostic Control , 2020, ICML.

[28] Matthew R. Walter,et al. Jointly Learning to Construct and Control Agents using Deep Reinforcement Learning , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[29] Sergey Levine,et al. Meta-Reinforcement Learning of Structured Exploration Strategies , 2018, NeurIPS.

[30] Chelsea Finn,et al. Rapidly Adaptable Legged Robots via Evolutionary Meta-Learning , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[31] Tao Xiang,et al. Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32] Razvan Pascanu,et al. Meta-Learning with Latent Embedding Optimization , 2018, ICLR.

[33] Sergey Levine,et al. Learning modular neural network policies for multi-task and multi-robot transfer , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[34] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[35] Sergey Levine,et al. One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning , 2018, Robotics: Science and Systems.

[36] Emilio Frazzoli,et al. Incremental Sampling-based Algorithms for Optimal Motion Planning , 2010, Robotics: Science and Systems.

[37] Karol Hausman,et al. Learning an Embedding Space for Transferable Robot Skills , 2018, ICLR.

[38] Thomas L. Griffiths,et al. Recasting Gradient-Based Meta-Learning as Hierarchical Bayes , 2018, ICLR.

[39] Danica Kragic,et al. Data-efficient visuomotor policy training using reinforcement learning and generative models , 2020, ArXiv.

[40] Marcin Andrychowicz,et al. Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[41] Marco Pavone,et al. Meta-Learning Priors for Efficient Online Bayesian Regression , 2018, WAFR.

[42] Andrew J. Davison,et al. Learning One-Shot Imitation From Humans Without Humans , 2019, IEEE Robotics and Automation Letters.

[43] Sergey Levine,et al. Deep visual foresight for planning robot motion , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[44] Marcin Andrychowicz,et al. One-Shot Imitation Learning , 2017, NIPS.

[45] Sergey Levine,et al. Learning to Adapt in Dynamic, Real-World Environments through Meta-Reinforcement Learning , 2018, ICLR.

[46] Chelsea Finn,et al. Deep Reinforcement Learning amidst Lifelong Non-Stationarity , 2020, ArXiv.