论文信息 - Imitation Learning via Simultaneous Optimization of Policies and Auxiliary Trajectories

Imitation Learning via Simultaneous Optimization of Policies and Auxiliary Trajectories

Imitation learning (IL) is a frequently used approach for data-efficient policy learning. Many IL methods, such as Dataset Aggregation (DAgger), combat challenges like distributional shift by interacting with oracular experts. Unfortunately, assuming access to oracular experts is often unrealistic in practice; data used in IL frequently comes from offline processes such as lead-through or teleoperation. In this paper, we present a novel imitation learning technique called Collocation for Demonstration Encoding (CoDE) that operates on only a fixed set of trajectory demonstrations. We circumvent challenges with methods like back-propagationthrough-time by introducing an auxiliary trajectory network, which takes inspiration from collocation techniques in optimal control. Our method generalizes well and more accurately reproduces the demonstrated behavior with fewer guiding trajectories when compared to standard behavioral cloning methods. We present simulation results on a 7-degree-of-freedom (DoF) robotic manipulator that learns to exhibit lifting, target-reaching, and obstacle avoidance behaviors.

[1] Vivian Chu,et al. Benchmark for Skill Learning from Demonstration: Impact of User Experience, Task Complexity, and Start Configuration on Performance , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[2] John Langford,et al. Learning nonlinear dynamic models , 2009, ICML '09.

[3] Anca D. Dragan,et al. DART: Noise Injection for Robust Imitation Learning , 2017, CoRL.

[4] Stefano Ermon,et al. Generative Adversarial Imitation Learning , 2016, NIPS.

[5] Sergey Levine,et al. Learning Robust Rewards with Adversarial Inverse Reinforcement Learning , 2017, ICLR 2017.

[6] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[7] Peter Englert,et al. Inverse KKT - Learning Cost Functions of Manipulation Tasks from Demonstrations , 2017, ISRR.

[8] Aaron D. Ames,et al. 3D dynamic walking with underactuated humanoid robots: A direct collocation framework for optimizing hybrid zero dynamics , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[9] Byron Boots,et al. Learning Reactive Motion Policies in Multiple Task Spaces from Human Demonstrations , 2019, CoRL.

[10] Pieter Abbeel,et al. Learning vehicular dynamics, with application to modeling helicopters , 2005, NIPS.

[11] Aude Billard,et al. Learning Stable Nonlinear Dynamical Systems With Gaussian Mixture Models , 2011, IEEE Transactions on Robotics.

[12] Byron Boots,et al. Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction , 2017, ICML.

[13] J. Andrew Bagnell,et al. Feedback in Imitation Learning: The Three Regimes of Covariate Shift , 2021, ArXiv.

[14] Dean Pomerleau,et al. ALVINN, an autonomous land vehicle in a neural network , 2015 .

[15] Sergey Levine,et al. How to train your robot with deep reinforcement learning: lessons we have learned , 2021, Int. J. Robotics Res..

[16] Martial Hebert,et al. Improving Multi-Step Prediction of Learned Time Series Models , 2015, AAAI.

[17] Byron Boots,et al. Grasping with Chopsticks: Combating Covariate Shift in Model-free Imitation Learning for Fine Manipulation , 2020, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[18] C. Hargraves,et al. DIRECT TRAJECTORY OPTIMIZATION USING NONLINEAR PROGRAMMING AND COLLOCATION , 1987 .

[19] Byron Boots,et al. RMP2: A Structured Composable Policy Class for Robot Learning , 2021, Robotics: Science and Systems.

[20] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[21] Daniel Kappler,et al. Riemannian Motion Policies , 2018, ArXiv.

[22] Moulay A. Akhloufi,et al. Learning to Drive by Imitation: An Overview of Deep Behavior Cloning Methods , 2021, IEEE Transactions on Intelligent Vehicles.

[23] Byron Boots,et al. Multi-Objective Policy Generation for Multi-Robot Systems Using Riemannian Motion Policies , 2019, ISRR.

[24] J. Andrew Bagnell,et al. Efficient Reductions for Imitation Learning , 2010, AISTATS.

[26] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[27] Franziska Meier,et al. SE3-Pose-Nets: Structured Deep Dynamics Models for Visuomotor Control , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[28] Scott Kuindersma,et al. Optimization and stabilization of trajectories for constrained dynamical systems , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[29] Sarah Bechtle,et al. Model-Based Inverse Reinforcement Learning from Visual Demonstrations , 2020, CoRL.

[30] James F. Epperson,et al. An Introduction to Numerical Methods and Analysis , 2001 .

[31] Paul J. Werbos,et al. Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[32] Byron Boots,et al. RMPflow: A Computational Graph for Automatic Motion Policy Generation , 2018, WAFR.

[33] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[34] Thomas Hofmann,et al. Predicting Structured Data (Neural Information Processing) , 2007 .

[35] Brett Browning,et al. A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[36] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[37] Sergey Levine,et al. Continuous Inverse Optimal Control with Locally Optimal Examples , 2012, ICML.

[38] G. Oriolo,et al. Robotics: Modelling, Planning and Control , 2008 .

[39] J. Andrew Bagnell,et al. Reinforcement and Imitation Learning via Interactive No-Regret Learning , 2014, ArXiv.