LazyDAgger: Reducing Context Switching in Interactive Imitation Learning

Corrective interventions while a robot is learning to automate a task provide an intuitive method for a human supervisor to assist the robot and convey information about desired behavior. However, these interventions can impose significant burden on a human supervisor, as each intervention interrupts other work the human is doing, incurs latency with each context switch between supervisor and autonomous control, and requires time to perform. We present LazyDAgger, which extends the interactive imitation learning (IL) algorithm SafeDAgger to reduce context switches between supervisor and autonomous control. We find that LazyDAgger improves the performance and robustness of the learned policy during both learning and execution while limiting burden on the supervisor. Simulation experiments suggest that LazyDAgger can reduce context switches by an average of 60% over SafeDAgger on 3 continuous control tasks while maintaining state-of-the-art policy performance. In physical fabric manipulation experiments with an ABB YuMi robot, LazyDAgger reduces context switches by 60% while achieving a 60% higher success rate than SafeDAgger at execution time.

[1]  Nolan Wagener,et al.  Safe Reinforcement Learning Using Advantage-Based Intervention , 2021, ICML.

[2]  Brijen Thananjeyan,et al.  Recovery RL: Safe Reinforcement Learning With Learned Recovery Zones , 2020, IEEE Robotics and Automation Letters.

[3]  Pieter Abbeel,et al.  LaND: Learning to Navigate From Disengagements , 2020, IEEE Robotics and Automation Letters.

[4]  Prashant Doshi,et al.  A Survey of Inverse Reinforcement Learning: Challenges, Methods and Progress , 2018, Artif. Intell..

[5]  Silvio Savarese,et al.  Human-in-the-Loop Imitation Learning using Remote Teleoperation , 2020, ArXiv.

[6]  J. Kober,et al.  Interactive Imitation Learning in State-Space , 2020, CoRL.

[7]  Sanjiban Choudhury,et al.  Learning from Interventions: Human-robot interaction as both explicit and implicit feedback , 2020, Robotics: Science and Systems.

[8]  Ken Goldberg,et al.  VisuoSpatial Foresight for Multi-Step, Multi-Task Fabric Manipulation , 2020, Robotics: Science and Systems.

[9]  Daniel S. Brown,et al.  Safe Imitation Learning via Fast Bayesian Reward Inference from Preferences , 2020, ICML.

[10]  Ken Goldberg,et al.  Deep Imitation Learning of Sequential Fabric Smoothing From an Algorithmic Supervisor , 2019, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[11]  S. Levine,et al.  Scaled Autonomy: Enabling Human Operators to Control Robot Fleets , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[12]  Dorsa Sadigh,et al.  Asking Easy Questions: A User-Friendly Approach to Active Reward Learning , 2019, CoRL.

[13]  S. Savarese,et al.  AC-Teach: A Bayesian Actor-Critic Method for Policy Learning with an Ensemble of Suboptimal Teachers , 2019, CoRL.

[14]  Brijen Thananjeyan,et al.  On-Policy Robot Imitation Learning from a Converging Supervisor , 2019, CoRL.

[15]  Scott Niekum,et al.  Better-than-Demonstrator Imitation Learning via Automatically-Ranked Demonstrations , 2019, CoRL.

[16]  Scott Niekum,et al.  Understanding Teacher Gaze Patterns for Robot Learning , 2019, CoRL.

[17]  Dorsa Sadigh,et al.  Learning Reward Functions by Integrating Human Demonstrations and Preferences , 2019, Robotics: Science and Systems.

[18]  Andrea Lockerd Thomaz,et al.  Active Attention-Modified Policy Shaping: Socially Interactive Agents Track , 2019, AAMAS.

[19]  Prabhat Nagarajan,et al.  Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations , 2019, ICML.

[20]  Katherine Rose Driggs-Campbell,et al.  HG-DAgger: Interactive Imitation Learning with Human Experts , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[21]  Katherine Rose Driggs-Campbell,et al.  EnsembleDAgger: A Bayesian Approach to Safe Imitation Learning , 2018, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[22]  Jakob Berggren,et al.  Performance Evaluation of Imitation Learning Algorithms with Human Experts , 2019 .

[23]  Shane Legg,et al.  Reward learning from human preferences and demonstrations in Atari , 2018, NeurIPS.

[24]  Xi Zhang,et al.  Intervention Aided Reinforcement Learning for Safe and Practical Policy Optimization in Navigation , 2018, CoRL.

[25]  Sen Wang,et al.  Learning with Training Wheels: Speeding up Training with a Simple Controller for Deep Reinforcement Learning , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[26]  Anca D. Dragan,et al.  Where Do You Think You're Going?: Inferring Beliefs about Dynamics from Behavior , 2018, NeurIPS.

[27]  Peter Stone,et al.  Behavioral Cloning from Observation , 2018, IJCAI.

[28]  Katerina Fragkiadaki,et al.  Reward Learning from Narrated Demonstrations , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  Pieter Abbeel,et al.  An Algorithmic Perspective on Imitation Learning , 2018, Found. Trends Robotics.

[30]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[31]  Anca D. Dragan,et al.  Shared Autonomy via Deep Reinforcement Learning , 2018, Robotics: Science and Systems.

[32]  Sergey Levine,et al.  Learning Robust Rewards with Adversarial Inverse Reinforcement Learning , 2017, ICLR 2017.

[33]  Anca D. Dragan,et al.  Active Preference-Based Learning of Reward Functions , 2017, Robotics: Science and Systems.

[34]  Shane Legg,et al.  Deep Reinforcement Learning from Human Preferences , 2017, NIPS.

[35]  Anca D. Dragan,et al.  DART: Noise Injection for Robust Imitation Learning , 2017, CoRL.

[36]  Ofra Amir,et al.  Interactive Teaching Strategies for Agent Training , 2016, IJCAI.

[37]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[38]  Kyunghyun Cho,et al.  Query-Efficient Imitation Learning for End-to-End Autonomous Driving , 2016, ArXiv.

[39]  Anca D. Dragan,et al.  SHIV : Reducing Supervisor Burden using Support Vectors for Efficient Learning from Demonstrations in High Dimensional State Spaces , 2015 .

[40]  Jessie Y. C. Chen,et al.  Human–Agent Teaming for Multirobot Control: A Review of Human Factors Issues , 2014, IEEE Transactions on Human-Machine Systems.

[41]  Jan Peters,et al.  Probabilistic Movement Primitives , 2013, NIPS.

[42]  Jun Nakanishi,et al.  Dynamical Movement Primitives: Learning Attractor Models for Motor Behaviors , 2013, Neural Computation.

[43]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[44]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[45]  Alan Fern,et al.  Active Imitation Learning via State Queries , 2011 .

[46]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[47]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[48]  Michael A. Goodrich,et al.  Validating human-robot interaction schemes in multitasking environments , 2005, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[49]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[50]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[51]  John Launchbury,et al.  A natural semantics for lazy evaluation , 1993, POPL '93.

[52]  Dean Pomerleau,et al.  Efficient Training of Artificial Neural Networks for Autonomous Navigation , 1991, Neural Computation.

[53]  Bimal K. Bose,et al.  An Adaptive Hysteresis-Band Current Control Technique of a Voltage-Fed Pwm Inverter for Machine Drive System , 1988, Proceedings.14 Annual Conference of Industrial Electronics Society.