Recurrent Existence Determination Through Policy Optimization

Binary determination of the presence of objects is one of the problems where humans perform extraordinarily better than computer vision systems, in terms of both speed and preciseness. One of the possible reasons is that humans can skip most of the clutter and attend only on salient regions. Recurrent attention models (RAM) are the first computational models to imitate the way humans process images via the REINFORCE algorithm. Despite that RAM is originally designed for image recognition, we extend it and present recurrent existence determination, an attention-based mechanism to solve the existence determination. Our algorithm employs a novel $k$-maximum aggregation layer and a new reward mechanism to address the issue of delayed rewards, which would have caused the instability of the training process. The experimental analysis demonstrates significant efficiency and accuracy improvement over existing approaches, on both synthetic and real-world datasets.

[1]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[2]  Li Fei-Fei,et al.  End-to-End Learning of Action Detection from Frame Glimpses in Videos , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[4]  L. Aiello,et al.  Retinopathy in diabetes. , 2004, Diabetes care.

[5]  Ali Borji,et al.  Salient object detection: A survey , 2014, Computational Visual Media.

[6]  Jiajin Li,et al.  Policy Optimization with Second-Order Advantage Information , 2018, IJCAI.

[7]  Matthew E. Taylor,et al.  Metatrace: Online Step-size Tuning by Meta-gradient Descent for Reinforcement Learning Control , 2018, ArXiv.

[8]  Sander Dieleman,et al.  Rotation-invariant convolutional neural networks for galaxy morphology prediction , 2015, ArXiv.

[9]  Benjamin Graham,et al.  Fractional Max-Pooling , 2014, ArXiv.

[10]  Koray Kavukcuoglu,et al.  Multiple Object Recognition with Visual Attention , 2014, ICLR.

[11]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[12]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[13]  Shuai Li,et al.  Contextual Combinatorial Cascading Bandits , 2016, ICML.

[14]  Warszawski Uniwersytet Medyczny,et al.  Diabetes care , 2019, Health at a Glance.

[15]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[16]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[17]  P. Murdin MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY , 2005 .

[18]  Soumith Chintala,et al.  A MultiPath Network for Object Detection , 2016, BMVC.

[19]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[20]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[21]  Alex Graves,et al.  Recurrent Models of Visual Attention , 2014, NIPS.

[22]  Alex Graves,et al.  DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.

[23]  Matthew E. Taylor,et al.  Metatrace Actor-Critic: Online Step-Size Tuning by Meta-gradient Descent for Reinforcement Learning Control , 2018, IJCAI.

[24]  Dhruv Batra,et al.  Human Attention in Visual Question Answering: Do Humans and Deep Networks look at the same regions? , 2016, EMNLP.

[25]  Dit-Yan Yeung,et al.  Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting , 2015, NIPS.

[26]  Geoffrey E. Hinton,et al.  Attend, Infer, Repeat: Fast Scene Understanding with Generative Models , 2016, NIPS.

[27]  Richard S. Sutton,et al.  Temporal credit assignment in reinforcement learning , 1984 .

[28]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Brendan J. Frey,et al.  Learning Wake-Sleep Recurrent Attention Models , 2015, NIPS.

[30]  Jürgen Schmidhuber,et al.  Solving Deep Memory POMDPs with Recurrent Policy Gradients , 2007, ICANN.

[31]  Peter Stone,et al.  Reinforcement learning , 2019, Scholarpedia.

[32]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.