Robot Learning on the Job: Human-in-the-Loop Autonomy and Learning During Deployment

With the rapid growth of computing powers and recent advances in deep learning, we have witnessed impressive demonstrations of novel robot capabilities in research settings. Nonetheless, these learning systems exhibit brittle generalization and require excessive training data for practical tasks. To harness the capabilities of state-of-the-art robot learning models while embracing their imperfections, we present Sirius, a principled framework for humans and robots to collaborate through a division of work. In this framework, partially autonomous robots are tasked with handling a major portion of decision-making where they work reliably; meanwhile, human operators monitor the process and intervene in challenging situations. Such a human-robot team ensures safe deployments in complex tasks. Further, we introduce a new learning algorithm to improve the policy's performance on the data collected from the task executions. The core idea is re-weighing training samples with approximated human trust and optimizing the policies with weighted behavioral cloning. We evaluate Sirius in simulation and on real hardware, showing that Sirius consistently outperforms baselines over a collection of contact-rich manipulation tasks, achieving an 8% boost in simulation and 27% on real hardware than the state-of-the-art methods, with twice faster convergence and 85% memory size reduction. Videos and code are available at https://ut-austin-rpl.github.io/sirius/

[1]  J. Kober,et al.  Interactive Imitation Learning in Robotics: A Survey , 2022, Found. Trends Robotics.

[2]  S. Levine,et al.  Learning on the Job: Self-Rewarding Offline-to-Online Finetuning for Industrial Insertion of Novel Connectors from Vision , 2022, 2023 IEEE International Conference on Robotics and Automation (ICRA).

[3]  Dorsa Sadigh,et al.  Eliciting Compatible Demonstrations for Multi-Human Imitation Learning , 2022, CoRL.

[4]  Chien-Ming Huang,et al.  Modeling Human Response to Robot Errors for Timely Error Detection , 2022, 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[5]  Haoran Xu,et al.  Discriminator-Weighted Offline Imitation Learning from Suboptimal Demonstrations , 2022, ICML.

[6]  S. Levine,et al.  When Should We Prefer Offline Reinforcement Learning Over Behavioral Cloning? , 2022, ArXiv.

[7]  Sergey Levine,et al.  Offline Reinforcement Learning with Implicit Q-Learning , 2021, ICLR.

[8]  Wolfram Burgard,et al.  Correct Me If I am Wrong: Interactive Learning for Robotic Manipulation , 2021, IEEE Robotics and Automation Letters.

[9]  Dorsa Sadigh,et al.  Learning reward functions from diverse sources of human feedback: Optimally integrating demonstrations and preferences , 2020, Int. J. Robotics Res..

[10]  Ashwin Balakrishna,et al.  ThriftyDAgger: Budget-Aware Novelty and Risk Gating for Interactive Imitation Learning , 2021, CoRL.

[11]  Jonathan Tompson,et al.  Implicit Behavioral Cloning , 2021, CoRL.

[12]  Pieter Abbeel,et al.  Skill Preferences: Learning to Extract and Execute Robotic Skills from Human Feedback , 2021, CoRL.

[13]  Silvio Savarese,et al.  What Matters in Learning from Offline Human Demonstrations for Robot Manipulation , 2021, CoRL.

[14]  Yuchen Cui,et al.  Understanding the Relationship between Interactions and Outcomes in Human-in-the-Loop Machine Learning , 2021, IJCAI.

[15]  Pieter Abbeel,et al.  PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training , 2021, ICML.

[16]  Sergey Levine,et al.  COMBO: Conservative Offline Model-Based Policy Optimization , 2021, NeurIPS.

[17]  S. Levine,et al.  OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement Learning , 2020, ICLR.

[18]  Siddhant Pradhan,et al.  Intervention Aware Shared Autonomy , 2021 .

[19]  Ryota Yamashina,et al.  Behavioral Cloning from Noisy Demonstrations , 2021, ICLR.

[20]  Silvio Savarese,et al.  Human-in-the-Loop Imitation Learning using Remote Teleoperation , 2020, ArXiv.

[21]  Misha Denil,et al.  Offline Learning from Demonstrations and Unlabeled Experience , 2020, ArXiv.

[22]  Sergey Levine,et al.  COG: Connecting New Skills to Past Experience with Offline Reinforcement Learning , 2020, ArXiv.

[23]  Lorenz Wellhausen,et al.  Learning quadrupedal locomotion over challenging terrain , 2020, Science Robotics.

[24]  Yuke Zhu,et al.  robosuite: A Modular Simulation Framework and Benchmark for Robot Learning , 2020, ArXiv.

[25]  Sanjiban Choudhury,et al.  Learning from Interventions: Human-robot interaction as both explicit and implicit feedback , 2020, Robotics: Science and Systems.

[26]  Takeo Igarashi,et al.  A Survey on Interactive Reinforcement Learning: Design Principles and Open Challenges , 2020, Conference on Designing Interactive Systems.

[27]  Bo Liu,et al.  Human Gaze Assisted Artificial Intelligence: A Review , 2020, IJCAI.

[28]  Nando de Freitas,et al.  Critic Regularized Regression , 2020, NeurIPS.

[29]  S. Levine,et al.  Accelerating Online Reinforcement Learning with Offline Datasets , 2020, ArXiv.

[30]  S. Levine,et al.  Conservative Q-Learning for Offline Reinforcement Learning , 2020, NeurIPS.

[31]  Lantao Yu,et al.  MOPO: Model-based Offline Policy Optimization , 2020, NeurIPS.

[32]  T. Joachims,et al.  MOReL : Model-Based Offline Reinforcement Learning , 2020, NeurIPS.

[33]  S. Levine,et al.  Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems , 2020, ArXiv.

[34]  Justin Fu,et al.  D4RL: Datasets for Deep Data-Driven Reinforcement Learning , 2020, ArXiv.

[35]  Li Fei-Fei,et al.  Learning to Generalize Across Long-Horizon Tasks from Human Demonstrations , 2020, Robotics: Science and Systems.

[36]  D. Fox,et al.  IRIS: Implicit Reinforcement without Interaction at Scale for Learning Control from Offline Robot Manipulation Data , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[37]  Oleg O. Sushkov,et al.  Scaling data-driven robotics with reward sketching and batch reinforcement learning , 2019, Robotics: Science and Systems.

[38]  Jakub W. Pachocki,et al.  Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..

[39]  Peter Stone,et al.  Leveraging Human Guidance for Deep Reinforcement Learning Tasks , 2019, IJCAI.

[40]  Prabhat Nagarajan,et al.  Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations , 2019, ICML.

[41]  Doina Precup,et al.  Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.

[42]  Katherine Rose Driggs-Campbell,et al.  HG-DAgger: Interactive Imitation Learning with Human Experts , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[43]  Katherine Rose Driggs-Campbell,et al.  EnsembleDAgger: A Bayesian Approach to Safe Imitation Learning , 2018, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[44]  Shane Legg,et al.  Scalable agent alignment via reward modeling: a research direction , 2018, ArXiv.

[45]  Sergey Levine,et al.  QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.

[46]  Anca D. Dragan,et al.  Shared Autonomy via Deep Reinforcement Learning , 2018, Robotics: Science and Systems.

[47]  Ken Goldberg,et al.  Deep Imitation Learning for Complex Manipulation Tasks from Virtual Reality Teleoperation , 2017, ICRA.

[48]  Peter Stone,et al.  Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces , 2017, AAAI.

[49]  Liming Zhu,et al.  Continuous Integration, Delivery and Deployment: A Systematic Review on Approaches, Tools, Challenges and Practices , 2017, IEEE Access.

[50]  Guan Wang,et al.  Interactive Learning from Policy-Dependent Human Feedback , 2017, ICML.

[51]  B. Argall,et al.  Human-in-the-Loop Optimization of Shared Autonomy in Assistive Robotics , 2017, IEEE Robotics and Automation Letters.

[52]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Julie A. Shah,et al.  Fast target prediction of human reaching motion for cooperative human-robot manipulation tasks using time series classification , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[54]  Siddhartha S. Srinivasa,et al.  Shared Autonomy via Hindsight Optimization , 2015, Robotics: Science and Systems.

[55]  Martial Hebert,et al.  Autonomy Infused Teleoperation with Application to BCI Manipulation , 2015, Robotics: Science and Systems.

[56]  Oliver Kroemer,et al.  Active Reward Learning , 2014, Robotics: Science and Systems.

[57]  Andrea Lockerd Thomaz,et al.  Policy Shaping: Integrating Human Feedback with Reinforcement Learning , 2013, NIPS.

[58]  Siddhartha S. Srinivasa,et al.  A policy-blending formalism for shared control , 2013, Int. J. Robotics Res..

[59]  Siddhartha S. Srinivasa,et al.  Formalizing Assistive Teleoperation , 2012, Robotics: Science and Systems.

[60]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[61]  Peter Stone,et al.  Interactively shaping agents via human reinforcement: the TAMER framework , 2009, K-CAP '09.

[62]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .