Efficiently Guiding Imitation Learning Algorithms with Human Gaze

Human gaze is known to be an intention-revealing signal in human demonstrations of tasks. In this work, we use gaze cues from human demonstrators to enhance the performance of state-of-the-art inverse reinforcement learning (IRL) and behavioral cloning (BC) algorithms. We propose a novel approach for utilizing gaze data in a computationally efficient manner --- encoding the human's attention as part of an auxiliary loss function, without adding any additional learnable parameters to those models and without requiring gaze data at test time. The auxiliary loss encourages a network to have convolutional activations in regions where the human's gaze fixated. We show how to augment any existing convolutional architecture with our auxiliary gaze loss (coverage-based gaze loss or CGL) that can guide learning toward a better reward function or policy. We show that our proposed approach improves performance of both BC and IRL methods on a variety of Atari games. We also compare against two baseline methods for utilizing gaze data with imitation learning methods. Our approach outperforms a baseline method, called gaze-modulated dropout (GMD), and is comparable to another method (AGIL) which uses gaze as input to the network and thus increases the amount of learnable parameters.

[1]  JOHN F. Young Machine Intelligence , 1971, Nature.

[2]  D. Ballard,et al.  Eye guidance in natural vision: reinterpreting salience. , 2011, Journal of vision.

[3]  Stefano Ermon,et al.  Label-Free Supervision of Neural Networks with Physics and Domain Knowledge , 2016, AAAI.

[4]  Gunnar Farnebäck,et al.  Two-Frame Motion Estimation Based on Polynomial Expansion , 2003, SCIA.

[5]  D. Ballard,et al.  Eye movements in natural behavior , 2005, Trends in Cognitive Sciences.

[6]  Ieee Xplore,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Information for Authors , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  O. Mimura [Eye movements]. , 1992, Nippon Ganka Gakkai zasshi.

[8]  Miao‐kun Sun,et al.  Trends in cognitive sciences , 2012 .

[9]  Stuart J. Russell,et al.  Inverse reinforcement learning for video games , 2018, ArXiv.

[10]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[12]  David Whitney,et al.  Periphery-Fovea Multi-Resolution Driving Model Guided by Human Attention , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[13]  Mark R. Wilson,et al.  Cheating experience: Guiding novices to adopt the gaze strategies of experts expedites the learning of technical laparoscopic skills. , 2012, Surgery.

[14]  T. Michael Knasel,et al.  Robotics and autonomous systems , 1988, Robotics Auton. Syst..

[15]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[16]  Scott Niekum,et al.  Understanding Teacher Gaze Patterns for Robot Learning , 2019, CoRL.

[17]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[18]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[19]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[20]  Michael C. Yip,et al.  Adversarial Imitation via Variational Inverse Reinforcement Learning , 2018, ICLR.

[21]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[22]  Luxin Zhang,et al.  Atari-HEAD: Atari Human Eye-Tracking and Demonstration Dataset , 2019, ArXiv.

[23]  Peter Stone,et al.  Leveraging Human Guidance for Deep Reinforcement Learning Tasks , 2019, IJCAI.

[24]  Ming Liu,et al.  A gaze model improves autonomous driving , 2019, ETRA.

[25]  Peter Stone,et al.  Behavioral Cloning from Observation , 2018, IJCAI.

[26]  Ming Liu,et al.  Gaze Training by Modulated Dropout Improves Imitation Learning , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[27]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[28]  Luxin Zhang,et al.  AGIL: Learning Attention from Human for Visuomotor Tasks , 2018, ECCV.

[29]  Stefan Schaal,et al.  Learning from Demonstration , 1996, NIPS.

[30]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[31]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[32]  Thierry Baccino,et al.  Methods for comparing scanpaths and saliency maps: strengths and weaknesses , 2012, Behavior Research Methods.

[33]  Shane Legg,et al.  Reward learning from human preferences and demonstrations in Atari , 2018, NeurIPS.

[34]  Xiaodong Gu,et al.  Towards dropout training for convolutional neural networks , 2015, Neural Networks.

[35]  Yusuke Yamani,et al.  Following Expert's Eyes: Evaluation of the Effectiveness of a Gaze-Based Training Intervention on Young Drivers' Latent Hazard Anticipation Skills , 2017 .

[36]  Alejandro Bordallo,et al.  Physical symbol grounding and instance learning through demonstration and eye tracking , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[37]  D. Signorini,et al.  Neural networks , 1995, The Lancet.

[38]  Sergey Levine,et al.  Learning Robust Rewards with Adversarial Inverse Reinforcement Learning , 2017, ICLR 2017.

[39]  Mary M Hayhoe,et al.  Task and context determine where you look. , 2016, Journal of vision.

[40]  Prabhat Nagarajan,et al.  Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations , 2019, ICML.

[41]  Frédo Durand,et al.  What Do Different Evaluation Metrics Tell Us About Saliency Models? , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Nando de Freitas,et al.  Playing hard exploration games by watching YouTube , 2018, NeurIPS.

[43]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[44]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[45]  M. Land Vision, eye movements, and natural behavior , 2009, Visual Neuroscience.

[46]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[47]  Sergey Levine,et al.  Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.

[48]  Martial Hebert,et al.  Learning Transferable Policies for Monocular Reactive MAV Control , 2016, ISER.

[49]  Proceedings of the 11th ACM Symposium on Eye Tracking Research & Applications , 2019, ETRA.

[50]  Alex S. Taylor,et al.  Machine intelligence , 2009, CHI.