Improving Robot-Centric Learning from Demonstration via Personalized Embeddings

Learning from demonstration (LfD) techniques seek to enable novice users to teach robots novel tasks in the real world. However, prior work has shown that robot-centric LfD approaches, such as Dataset Aggregation (DAgger), do not perform well with human teachers. DAgger requires a human demonstrator to provide corrective feedback to the learner either in real-time, which can result in degraded performance due to suboptimal human labels, or in a post hoc manner which is time intensive and often not feasible. To address this problem, we present Mutual Information-driven Metalearning from Demonstration (MIND MELD), which metalearns a mapping from poor quality human labels to predicted ground truth labels, thereby improving upon the performance of prior LfD approaches for DAgger-based training. The key to our approach for improving upon suboptimal feedback is mutual information maximization via variational inference. Our approach learns a meaningful, personalized embedding via variational inference which informs the mapping from human provided labels to predicted ground truth labels. We demonstrate our framework in a synthetic domain and in a human-subjects experiment, illustrating that our approach improves upon the corrective labels provided by a human demonstrator by 63%.

[1]  Maya Cakmak,et al.  Power to the People: The Role of Humans in Interactive Machine Learning , 2014, AI Mag..

[2]  Sanjiban Choudhury,et al.  Learning from Interventions: Human-robot interaction as both explicit and implicit feedback , 2020, Robotics: Science and Systems.

[3]  Katherine Rose Driggs-Campbell,et al.  HG-DAgger: Interactive Imitation Learning with Human Experts , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[4]  Sonia Chernova,et al.  Recent Advances in Robot Learning from Demonstration , 2020, Annu. Rev. Control. Robotics Auton. Syst..

[5]  Matthew C. Gombolay,et al.  Inferring Personalized Bayesian Embeddings for Learning from Heterogeneous Demonstration , 2019, ArXiv.

[6]  He He,et al.  Imitation Learning by Coaching , 2012, NIPS.

[7]  Claude Sammut,et al.  Automatically Constructing Control Systems by Observing Human Behaviour , 2003 .

[8]  Aran Sena,et al.  Quantifying teaching behavior in robot learning from demonstration , 2019, Int. J. Robotics Res..

[9]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[10]  Anca D. Dragan,et al.  Comparing human-centric and robot-centric sampling for robot deep learning from demonstrations , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[11]  A. Cooper,et al.  A confirmatory factor analysis of the Mini-IPIP five-factor model personality scale , 2010 .

[12]  Martial Hebert,et al.  Learning monocular reactive UAV control in cluttered natural environments , 2012, 2013 IEEE International Conference on Robotics and Automation.

[13]  Manuela M. Veloso,et al.  Interactive Policy Learning through Confidence-Based Autonomy , 2014, J. Artif. Intell. Res..

[14]  Torwards Unpaired Human-to-Robot Demonstration Translation Learning Novel Tasks , 2020 .

[15]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[16]  Katherine Rose Driggs-Campbell,et al.  EnsembleDAgger: A Bayesian Approach to Safe Imitation Learning , 2018, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[17]  Michael Johnson,et al.  Four Years in Review: Statistical Practices of Likert Scales in Human-Robot Interaction Studies , 2020, HRI.

[18]  Jakob Berggren,et al.  Performance Evaluation of Imitation Learning Algorithms with Human Experts , 2019 .

[19]  K. Sycara,et al.  Augmenting GAIL with BC for sample efficient imitation learning , 2020, CoRL.

[20]  J. Andrew Bagnell,et al.  Efficient Reductions for Imitation Learning , 2010, AISTATS.

[21]  Geoffrey J. Gordon,et al.  No-Regret Reductions for Imitation Learning and Structured Prediction , 2010, ArXiv.

[22]  Anca D. Dragan,et al.  SHIV: Reducing supervisor burden in DAgger using support vectors for efficient learning from demonstrations in high dimensional state spaces , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).