Efficiently Guiding Imitation Learning Agents with Human Gaze

Human gaze is known to be an intention-revealing signal in human demonstrations of tasks. In this work, we use gaze cues from human demonstrators to enhance the performance of agents trained via three popular imitation learning methods -- behavioral cloning (BC), behavioral cloning from observation (BCO), and Trajectory-ranked Reward EXtrapolation (T-REX). Based on similarities between the attention of reinforcement learning agents and human gaze, we propose a novel approach for utilizing gaze data in a computationally efficient manner, as part of an auxiliary loss function, which guides a network to have higher activations in image regions where the human's gaze fixated. This work is a step towards augmenting any existing convolutional imitation learning agent's training with auxiliary gaze data. Our auxiliary coverage-based gaze loss (CGL) guides learning toward a better reward function or policy, without adding any additional learnable parameters and without requiring gaze data at test time. We find that our proposed approach improves the performance by 95% for BC, 343% for BCO, and 390% for T-REX, averaged over 20 different Atari games. We also find that compared to a prior state-of-the-art imitation learning method assisted by human gaze (AGIL), our method achieves better performance, and is more efficient in terms of learning with fewer demonstrations. We further interpret trained CGL agents with a saliency map visualization method to explain their performance. At last, we show that CGL can help alleviate a well-known causal confusion problem in imitation learning.

[1]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[2]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[3]  Stefano Ermon,et al.  Label-Free Supervision of Neural Networks with Physics and Domain Knowledge , 2016, AAAI.

[4]  Stuart J. Russell,et al.  Inverse reinforcement learning for video games , 2018, ArXiv.

[5]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[6]  Sergey Levine,et al.  Learning Robust Rewards with Adversarial Inverse Reinforcement Learning , 2017, ICLR 2017.

[7]  Andrea Lockerd Thomaz,et al.  Human Gaze Following for Human-Robot Interaction , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[8]  Scott Niekum,et al.  Understanding Teacher Gaze Patterns for Robot Learning , 2019, CoRL.

[9]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[10]  Mark R. Wilson,et al.  Cheating experience: Guiding novices to adopt the gaze strategies of experts expedites the learning of technical laparoscopic skills. , 2012, Surgery.

[11]  Yusuke Yamani,et al.  Following Expert's Eyes: Evaluation of the Effectiveness of a Gaze-Based Training Intervention on Young Drivers' Latent Hazard Anticipation Skills , 2017 .

[12]  Bo Liu,et al.  Human Gaze Assisted Artificial Intelligence: A Review , 2020, IJCAI.

[13]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[14]  Stefano Nolfi,et al.  Selective attention enables action selection: evidence from evolutionary robotics experiments , 2013, Adapt. Behav..

[15]  Dana H. Ballard,et al.  An Initial Attempt of Combining Visual Selective Attention with Deep Reinforcement Learning , 2018, ArXiv.

[16]  D. Ballard,et al.  Eye movements in natural behavior , 2005, Trends in Cognitive Sciences.

[17]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[18]  Tom Schaul,et al.  Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[19]  Jonathan Dodge,et al.  Visualizing and Understanding Atari Agents , 2017, ICML.

[20]  Sergey Levine,et al.  Causal Confusion in Imitation Learning , 2019, NeurIPS.

[21]  J. Platt,et al.  Constrained Differential Optimization for Neural Networks , 1988 .

[22]  Alejandro Bordallo,et al.  Physical symbol grounding and instance learning through demonstration and eye tracking , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[23]  Xiaodong Gu,et al.  Towards dropout training for convolutional neural networks , 2015, Neural Networks.

[24]  M. Land Vision, eye movements, and natural behavior , 2009, Visual Neuroscience.

[25]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[26]  Michael C. Yip,et al.  Adversarial Imitation via Variational Inverse Reinforcement Learning , 2018, ICLR.

[27]  M. Posner,et al.  Orienting of Attention* , 1980, The Quarterly journal of experimental psychology.

[28]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[29]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[30]  Luxin Zhang,et al.  Atari-HEAD: Atari Human Eye-Tracking and Demonstration Dataset , 2019, ArXiv.

[31]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[32]  Mary M Hayhoe,et al.  Task and context determine where you look. , 2016, Journal of vision.

[33]  Peter Stone,et al.  Leveraging Human Guidance for Deep Reinforcement Learning Tasks , 2019, IJCAI.

[34]  Sergey Levine,et al.  Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.

[35]  Ming Liu,et al.  Gaze Training by Modulated Dropout Improves Imitation Learning , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[36]  Stefan Schaal,et al.  Learning from Demonstration , 1996, NIPS.

[37]  D. Ballard,et al.  Eye guidance in natural vision: reinterpreting salience. , 2011, Journal of vision.

[38]  Peter Stone,et al.  Behavioral Cloning from Observation , 2018, IJCAI.

[39]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[40]  Martial Hebert,et al.  Learning Transferable Policies for Monocular Reactive MAV Control , 2016, ISER.

[41]  Nando de Freitas,et al.  Playing hard exploration games by watching YouTube , 2018, NeurIPS.

[42]  Shane Legg,et al.  Reward learning from human preferences and demonstrations in Atari , 2018, NeurIPS.

[43]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[45]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[46]  Frédo Durand,et al.  What Do Different Evaluation Metrics Tell Us About Saliency Models? , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Thierry Baccino,et al.  Methods for comparing scanpaths and saliency maps: strengths and weaknesses , 2012, Behavior Research Methods.

[48]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[49]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[50]  Ming Liu,et al.  A gaze model improves autonomous driving , 2019, ETRA.

[51]  Luxin Zhang,et al.  AGIL: Learning Attention from Human for Visuomotor Tasks , 2018, ECCV.

[52]  Prabhat Nagarajan,et al.  Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations , 2019, ICML.

[53]  Gunnar Farnebäck,et al.  Two-Frame Motion Estimation Based on Polynomial Expansion , 2003, SCIA.

[54]  David Whitney,et al.  Periphery-Fovea Multi-Resolution Driving Model Guided by Human Attention , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).