Spatio-Temporal Attention Deep Recurrent Q-Network for POMDPs

One of the long-standing challenges for reinforcement learning agents is to deal with noisy environments. Although progress has been made in producing an agent capable of optimizing its environment in fully observable conditions, partial observability still remains a difficult task. In this paper, a novel model is proposed which inspired by human perception, utilizes two fundamental machine learning concepts, attention and memory, to better confront a noisy environment.

[1]  Pascal Poupart,et al.  On Improving Deep Reinforcement Learning for POMDPs , 2017, ArXiv.

[2]  Wenjun Zeng,et al.  Spatio-Temporal Attention-Based LSTM Networks for 3D Action Recognition and Detection , 2018, IEEE Transactions on Image Processing.

[3]  Mikhail Pavlov,et al.  Deep Attention Recurrent Q-Network , 2015, ArXiv.

[4]  Shimon Whiteson,et al.  Deep Variational Reinforcement Learning for POMDPs , 2018, ICML.

[5]  Joelle Pineau,et al.  Online Planning Algorithms for POMDPs , 2008, J. Artif. Intell. Res..

[6]  J. Juola,et al.  Are spatial and temporal attention independent? , 2007, Perception & psychophysics.

[7]  Heng Tao Shen,et al.  Hierarchical LSTM with Adjusted Temporal Attention for Video Captioning , 2017, IJCAI.

[8]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[9]  Yunchao Wei,et al.  STA: Spatial-Temporal Attention for Large-Scale Video-based Person Re-Identification , 2018, AAAI.

[10]  Anna C. Nobre,et al.  Synergistic Effect of Combined Temporal and Spatial Expectations on Visual Attention , 2005, The Journal of Neuroscience.

[11]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[12]  R. Bellman A Markovian Decision Process , 1957 .

[13]  Jeremy S. Smith,et al.  Hierarchical Multi-scale Attention Networks for action recognition , 2017, Signal Process. Image Commun..

[14]  Peter Stone,et al.  Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[15]  Wenjun Zeng,et al.  An End-to-End Spatio-Temporal Attention Model for Human Action Recognition from Skeleton Data , 2016, AAAI.

[16]  Shimon Whiteson,et al.  Point-Based Planning for Multi-Objective POMDPs , 2015, IJCAI.

[17]  Alex Graves,et al.  Recurrent Models of Visual Attention , 2014, NIPS.

[18]  Demis Hassabis,et al.  Neural Episodic Control , 2017, ICML.

[19]  Paul J. Werbos,et al.  Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[20]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[21]  Brendan J. Frey,et al.  Learning Wake-Sleep Recurrent Attention Models , 2015, NIPS.

[22]  Bo Zhao,et al.  Where and When to Look? Spatio-temporal Attention for Action Recognition in Videos , 2018, ArXiv.

[23]  Wei Wu,et al.  End-to-End Flow Correlation Tracking with Spatial-Temporal Attention , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[25]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.