Human-AI Shared Control via Frequency-based Policy Dissection

Human-AI shared control allows human to interact and collaborate with autonomous agents to accomplish control tasks in complex environments. Previous Reinforcement Learning (RL) methods attempted goal-conditioned designs to achieve human-controllable policies at the cost of redesigning the reward function and training paradigm. Inspired by the neuroscience approach to investigate the motor cortex in primates, we develop a simple yet effective frequency-based approach called Policy Dissection to align the intermediate representation of the learned neural controller with the kinematic attributes of the agent behavior. Without modifying the neural controller or retraining the model, the proposed approach can convert a given RL-trained policy into a human-controllable policy. We evaluate the proposed approach on many RL tasks such as autonomous driving and locomotion. The experiments show that human-AI shared control system achieved by Policy Dissection in driving task can substantially improve the performance and safety in unseen traffic scenes. With human in the inference loop, the locomotion robots also exhibit versatile controllable motion skills even though they are only trained to move forward. Our results suggest the promising direction of implementing human-AI shared autonomy through interpreting the learned representation of the autonomous agents. Code and demo videos are available at https://metadriverse.github.io/policydissect locomotion tasks, and the result suggests the improvement on test-time generalizability and ability to achieve task transfer. We further provide intriguing demonstrations on various other environments to show the generality of our method and the wider applications of human-AI collaboration.

[1]  Bolei Zhou,et al.  MetaDrive: Composing Diverse Driving Scenarios for Generalizable Reinforcement Learning , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Xiaolong Wang,et al.  Learning Vision-Guided Quadrupedal Locomotion End-to-End with Cross-Modal Transformers , 2021, ICLR.

[3]  Marina Jirotka,et al.  Explanations in Autonomous Driving: A Survey , 2021, IEEE Transactions on Intelligent Transportation Systems.

[4]  P. Pérez,et al.  Explainability of Deep Vision-Based Autonomous Driving Systems: Review and Challenges , 2021, International Journal of Computer Vision.

[5]  Xun Xue,et al.  A Survey of Data-Driven and Knowledge-Aware eXplainable AI , 2020, IEEE Transactions on Knowledge and Data Engineering.

[6]  Young-Gyu Yoon,et al.  Inducing Functions through Reinforcement Learning without Task Specification , 2021, ArXiv.

[7]  Bolei Zhou,et al.  Safe Driving via Expert Guided Policy Optimization , 2021, CoRL.

[8]  Philipp Reist,et al.  Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning , 2021, CoRL.

[9]  Jitendra Malik,et al.  RMA: Rapid Motor Adaptation for Legged Robots , 2021, Robotics: Science and Systems.

[10]  Peter Stone,et al.  Recent advances in leveraging human guidance for sequential decision-making tasks , 2021, Autonomous Agents and Multi-Agent Systems.

[11]  Jingda Wu,et al.  Human-in-the-Loop Deep Reinforcement Learning with Application to Autonomous Driving , 2021, ArXiv.

[12]  Sam Devlin,et al.  Evaluating the Robustness of Collaborative Agents , 2021, AAMAS.

[13]  M. Hayhoe,et al.  Machine versus Human Attention in Deep Reinforcement Learning Tasks , 2020, NeurIPS.

[14]  Bolei Zhou,et al.  Generative Hierarchical Features from Synthesizing Images , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Bolei Zhou,et al.  Closed-Form Factorization of Latent Semantics in GANs , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Bolei Zhou,et al.  Semantic Hierarchy Emerges in Deep Generative Representations for Scene Synthesis , 2019, International Journal of Computer Vision.

[17]  Silvio Savarese,et al.  Human-in-the-Loop Imitation Learning using Remote Teleoperation , 2020, ArXiv.

[18]  Soheil Feizi,et al.  Benchmarking Deep Learning Interpretability in Time Series Predictions , 2020, NeurIPS.

[19]  Lorenz Wellhausen,et al.  Learning quadrupedal locomotion over challenging terrain , 2020, Science Robotics.

[20]  Bolei Zhou,et al.  Understanding the role of individual units in a deep neural network , 2020, Proceedings of the National Academy of Sciences.

[21]  Dorsa Sadigh,et al.  Reinforcement Learning based Control of Imitative Policies for Near-Accident Driving , 2020, Robotics: Science and Systems.

[22]  Dorsa Sadigh,et al.  Shared Autonomy with Learned Latent Actions , 2020, Robotics: Science and Systems.

[23]  Matthew R. Walter,et al.  Residual Policy Learning for Shared Autonomy , 2020, Robotics: Science and Systems.

[24]  S. Levine,et al.  Learning Human Objectives by Evaluating Hypothetical Behavior , 2019, ICML.

[25]  Alex Kendall,et al.  Urban Driving with Conditional Imitation Learning , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[26]  Yuval Tassa,et al.  Deep neuroethology of a virtual rodent , 2019, ICLR.

[27]  Balaji Krishnamurthy,et al.  Explain Your Move: Understanding Agent Actions Using Specific and Relevant Feature Attribution , 2019, ICLR.

[28]  David D. Jensen,et al.  Exploratory Not Explanatory: Counterfactual Analysis of Saliency Maps for Deep Reinforcement Learning , 2019, ICLR.

[29]  Ashesh K Dhawale,et al.  The basal ganglia can control learned motor sequences independently of motor cortex , 2019 .

[30]  Peter Stone,et al.  Leveraging Human Guidance for Deep Reinforcement Learning Tasks , 2019, IJCAI.

[31]  Jörn Diedrichsen,et al.  Peeling the Onion of Brain Representations. , 2019, Annual review of neuroscience.

[32]  Geoffrey E. Hinton,et al.  Similarity of Neural Network Representations Revisited , 2019, ICML.

[33]  Santiago Grijalva,et al.  A Review of Reinforcement Learning for Autonomous Building Energy Management , 2019, Comput. Electr. Eng..

[34]  Ryan K. Orosco,et al.  Open-Sourced Reinforcement Learning Environments for Surgical Robotics , 2019, ArXiv.

[35]  Marc G. Bellemare,et al.  An Atari Model Zoo for Analyzing, Visualizing, and Comparing Deep Reinforcement Learning Agents , 2018, IJCAI.

[36]  Taehoon Kim,et al.  Quantifying Generalization in Reinforcement Learning , 2018, ICML.

[37]  Katherine Rose Driggs-Campbell,et al.  HG-DAgger: Interactive Imitation Learning with Human Experts , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[38]  Bolei Zhou,et al.  GAN Dissection: Visualizing and Understanding Generative Adversarial Networks , 2018, ICLR.

[39]  Yee Whye Teh,et al.  Neural probabilistic motor primitives for humanoid control , 2018, ICLR.

[40]  David Janz,et al.  Learning to Drive in a Day , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[41]  Bolei Zhou,et al.  Interpreting Deep Visual Representations via Network Dissection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Emma Brunskill,et al.  Fake It Till You Make It: Learning-Compatible Performance Support , 2019, UAI.

[43]  Alex Fridman,et al.  Human-Centered Autonomous Vehicle Systems: Principles of Effective Shared Autonomy , 2018, ArXiv.

[44]  Sergey Levine,et al.  Data-Efficient Hierarchical Reinforcement Learning , 2018, NeurIPS.

[45]  Kate Saenko,et al.  Hierarchical Reinforcement Learning with Hindsight , 2018, ArXiv.

[46]  Razvan Pascanu,et al.  Vector-based navigation using grid-like representations in artificial agents , 2018, Nature.

[47]  Sergey Levine,et al.  DeepMimic , 2018, ACM Trans. Graph..

[48]  Xue-Xin Wei,et al.  Emergence of grid-like representations by training recurrent neural networks to perform spatial localization , 2018, ICLR.

[49]  Anca D. Dragan,et al.  Shared Autonomy via Deep Reinforcement Learning , 2018, Robotics: Science and Systems.

[50]  Quanshi Zhang,et al.  Visual interpretability for deep learning: a survey , 2018, Frontiers of Information Technology & Electronic Engineering.

[51]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[52]  Alexey Dosovitskiy,et al.  End-to-End Driving Via Conditional Imitation Learning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[53]  Peter Stone,et al.  Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces , 2017, AAAI.

[54]  Owain Evans,et al.  Trial without Error: Towards Safe Reinforcement Learning via Human Intervention , 2017, AAMAS.

[55]  Shan Carter,et al.  Using Artificial Intelligence to Augment Human Intelligence , 2017 .

[56]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[57]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[58]  John F. Canny,et al.  Interpretable Learning for Self-Driving Cars by Visualizing Causal Attention , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[59]  Ofra Amir,et al.  Interactive Teaching Strategies for Agent Training , 2016, IJCAI.

[60]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[61]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[63]  Ramprasaath R. Selvaraju,et al.  Grad-CAM: Why did you say that? Visual Explanations from Deep Networks via Gradient-based Localization , 2016 .

[64]  Tom Schaul,et al.  Universal Value Function Approximators , 2015, ICML.

[65]  Li Su,et al.  A Toolbox for Representational Similarity Analysis , 2014, PLoS Comput. Biol..

[66]  Emre Ugur,et al.  Self-discovery of motor primitives and learning grasp affordances , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[67]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[68]  Andrew G. Barto,et al.  Motor primitive discovery , 2012, 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL).

[69]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[70]  M. Graziano The Intelligent Movement Machine: An Ethological Perspective on the Primate Motor System , 2008 .

[71]  Nikolaus Kriegeskorte,et al.  Frontiers in Systems Neuroscience Systems Neuroscience , 2022 .

[72]  J. Konczak On the notion of motor primitives in humans and robots , 2005 .

[73]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[74]  Jan Peters,et al.  Learning Motor Primitives with Reinforcement Learning , 2004, AAAI 2004.

[75]  Richard Hans Robert Hahnloser,et al.  An ultra-sparse code underliesthe generation of neural sequences in a songbird , 2002, Nature.

[76]  M. Graziano,et al.  Complex Movements Evoked by Microstimulation of Precentral Cortex , 2002, Neuron.