An Initial Attempt of Combining Visual Selective Attention with Deep Reinforcement Learning

Visual attention serves as a means of feature selection mechanism in the perceptual system. Motivated by Broadbent's leaky filter model of selective attention, we evaluate how such mechanism could be implemented and affect the learning process of deep reinforcement learning. We visualize and analyze the feature maps of DQN on a toy problem Catch, and propose an approach to combine visual selective attention with deep reinforcement learning. We experiment with optical flow-based attention and A2C on Atari games. Experiment results show that visual selective attention could lead to improvements in terms of sample efficiency on tested games. An intriguing relation between attention and batch normalization is also discovered.

[1]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[2]  Philip Bachman,et al.  Deep Reinforcement Learning that Matters , 2017, AAAI.

[3]  Andrew Zisserman,et al.  Convolutional Two-Stream Network Fusion for Video Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[5]  Tom Schaul,et al.  Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[6]  Mikhail Pavlov,et al.  Deep Attention Recurrent Q-Network , 2015, ArXiv.

[7]  M. Gluck,et al.  Hippocampal mediation of stimulus representation: A computational theory , 1993, Hippocampus.

[8]  Luxin Zhang,et al.  AGIL: Learning Attention from Human for Visuomotor Tasks , 2018, ECCV.

[9]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[10]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[11]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[12]  Jonathan D. Cohen,et al.  The effects of neural gain on attention and learning , 2013, Nature Neuroscience.

[13]  Andrea Palazzi,et al.  Predicting the Driver's Focus of Attention: The DR(eye)VE Project , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Honglak Lee,et al.  Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.

[15]  Alex Graves,et al.  Recurrent Models of Visual Attention , 2014, NIPS.

[16]  Shie Mannor,et al.  Graying the black box: Understanding DQNs , 2016, ICML.

[17]  David Fitzpatrick,et al.  Vision: The Eye , 2001 .

[18]  Matthew H Tong,et al.  SUN: Top-down saliency using natural statistics , 2009, Visual cognition.

[19]  Gunnar Farnebäck,et al.  Two-Frame Motion Estimation Based on Polynomial Expansion , 2003, SCIA.

[20]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[21]  Jonathan Dodge,et al.  Visualizing and Understanding Atari Agents , 2017, ICML.

[22]  Alexander C. Berg,et al.  Combining multiple sources of knowledge in deep CNNs for action recognition , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[23]  D. Ballard,et al.  Eye movements in natural behavior , 2005, Trends in Cognitive Sciences.

[24]  Dušan Nemec,et al.  A visual attention operator for playing Pac-Man , 2018, 2018 ELEKTRO.

[25]  Geoffrey E. Hinton,et al.  Learning to combine foveal glimpses with a third-order Boltzmann machine , 2010, NIPS.

[26]  Anne Treisman,et al.  Monitoring and storage of irrelevant messages in selective attention , 1964 .

[27]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[28]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[29]  J. Deutsch Perception and Communication , 1958, Nature.

[30]  Sergey Levine,et al.  Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.