Co-GAIL: Learning Diverse Strategies for Human-Robot Collaboration

We present a method for learning a human-robot collaboration policy from human-human collaboration demonstrations. An effective robot assistant must learn to handle diverse human behaviors shown in the demonstrations and be robust when the humans adjust their strategies during online task execution. Our method co-optimizes a human policy and a robot policy in an interactive learning process: the human policy learns to generate diverse and plausible collaborative behaviors from demonstrations while the robot policy learns to assist by estimating the unobserved latent strategy of its human collaborator. Across a 2D strategy game, a human-robot handover task, and a multi-step collaborative manipulation task, our method outperforms the alternatives in both simulated evaluations and when executing the tasks with a real human operator inthe-loop. Supplementary materials and videos at https://sites.google.com/ view/co-gail-web/home

[1]  David Silver,et al.  A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning , 2017, NIPS.

[2]  Ken Goldberg,et al.  Deep Imitation Learning for Complex Manipulation Tasks from Virtual Reality Teleoperation , 2017, ICRA.

[3]  Shen Li,et al.  Decision-Making for Bidirectional Communication in Sequential Human-Robot Collaborative Tasks , 2020, 2020 15th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[4]  Daan Wierstra,et al.  Variational Intrinsic Control , 2016, ICLR.

[5]  Siddhartha S. Srinivasa,et al.  Towards Robotic Feeding: Role of Haptics in Fork-Based Food Manipulation , 2018, IEEE Robotics and Automation Letters.

[6]  Abhinav Gupta,et al.  Efficient Bimanual Manipulation Using Learned Task Schemas , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[7]  Peter A. Beling,et al.  Multi-agent Inverse Reinforcement Learning for Zero-sum Games , 2014, ArXiv.

[8]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[9]  Julie A. Shah,et al.  Fast target prediction of human reaching motion for cooperative human-robot manipulation tasks using time series classification , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[10]  Stefano Ermon,et al.  InfoGAIL: Interpretable Imitation Learning from Visual Demonstrations , 2017, NIPS.

[11]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[12]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[13]  Silvio Savarese,et al.  iGibson 1.0: A Simulation Environment for Interactive Tasks in Large Realistic Scenes , 2020, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[14]  Bernhard Schölkopf,et al.  Probabilistic movement modeling for intention inference in human–robot interaction , 2013, Int. J. Robotics Res..

[15]  Charles C. Kemp,et al.  A Multimodal Anomaly Detector for Robot-Assisted Feeding Using an LSTM-Based Variational Autoencoder , 2017, IEEE Robotics and Automation Letters.

[16]  Fan Zhang,et al.  Probabilistic Real-Time User Posture Tracking for Personalized Robot-Assisted Dressing , 2019, IEEE Transactions on Robotics.

[17]  Silvio Savarese,et al.  Learning Multi-Arm Manipulation Through Collaborative Teleoperation , 2020, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[18]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[19]  Heni Ben Amor,et al.  A system for learning continuous human-robot interactions from human-human demonstrations , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[20]  Shakir Mohamed,et al.  Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning , 2015, NIPS.

[21]  Sergey Levine,et al.  Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.

[22]  Raymond Y. K. Lau,et al.  Least Squares Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[23]  Alexandre M. Bayen,et al.  Benchmarks for reinforcement learning in mixed-autonomy traffic , 2018, CoRL.

[24]  Nando de Freitas,et al.  Robust Imitation of Diverse Behaviors , 2017, NIPS.

[25]  Sonia Chernova,et al.  Recent Advances in Robot Learning from Demonstration , 2020, Annu. Rev. Control. Robotics Auton. Syst..

[26]  A MarvelJeremy,et al.  Towards Effective Interface Designs for Collaborative HRI in Manufacturing , 2020 .

[27]  Chelsea Finn,et al.  Learning Latent Representations to Influence Multi-Agent Interaction , 2020, CoRL.

[28]  Yuandong Tian,et al.  Multi-Agent Collaboration via Reward Attribution Decomposition , 2020, ArXiv.

[29]  Pieter Abbeel,et al.  Variational Option Discovery Algorithms , 2018, ArXiv.

[30]  Laurent Jeanpierre,et al.  Coordinated Multi-Robot Exploration Under Communication Constraints Using Decentralized Markov Decision Processes , 2012, AAAI.

[31]  Stefano Ermon,et al.  Multi-Agent Generative Adversarial Imitation Learning , 2018, NeurIPS.

[32]  Stefanos Nikolaidis,et al.  Efficient Model Learning from Joint-Action Demonstrations for Human-Robot Collaborative Tasks , 2015, 2015 10th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[33]  Maya Cakmak,et al.  Reactive Human-to-Robot Handovers of Arbitrary Objects , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[34]  Dorian Kodelja,et al.  Multiagent cooperation and competition with deep reinforcement learning , 2015, PloS one.

[35]  Risto Miikkulainen,et al.  Evolving explicit opponent models in game playing , 2007, GECCO '07.

[36]  Kristian Kersting,et al.  Multi-Agent Inverse Reinforcement Learning , 2010, 2010 Ninth International Conference on Machine Learning and Applications.

[37]  Joel Z. Leibo,et al.  Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.

[38]  Jangwon Lee,et al.  A survey of robot learning from demonstrations for Human-Robot Collaboration , 2017, ArXiv.

[39]  Marcin Andrychowicz,et al.  Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.

[40]  Rob Fergus,et al.  Modeling Others using Oneself in Multi-Agent Reinforcement Learning , 2018, ICML.

[41]  Ali Farhadi,et al.  A Cordial Sync: Going Beyond Marginal Policies for Multi-Agent Embodied Tasks , 2020, ECCV.

[42]  Filip De Turck,et al.  Curiosity-driven Exploration in Deep Reinforcement Learning via Bayesian Neural Networks , 2016, ArXiv.

[43]  Sergey Levine,et al.  QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.

[44]  Greg Turk,et al.  Learning to Collaborate From Simulation for Robot-Assisted Dressing , 2019, IEEE Robotics and Automation Letters.

[45]  Jun Wang,et al.  Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games , 2017, ArXiv.

[46]  Peter Stone,et al.  Empowerment for continuous agent—environment systems , 2011, Adapt. Behav..

[47]  Silvio Savarese,et al.  ROBOTURK: A Crowdsourcing Platform for Robotic Skill Learning through Imitation , 2018, CoRL.

[48]  Marcin Andrychowicz,et al.  Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[49]  Wolfram Burgard,et al.  An autonomous robotic assistant for drinking , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).