Gaussian RAM: Lightweight Image Classification via Stochastic Retina-Inspired Glimpse and Reinforcement Learning

Previous studies on image classification have mainly focused on the performance of the networks, not on real-time operation or model compression. We propose a Gaussian Deep Recurrent visual Attention Model (GDRAM) - a reinforcement learning based lightweight deep neural network for large scale image classification that outperforms the conventional CNN (Convolutional Neural Network) which uses the entire image as input. Highly inspired by the biological visual recognition process, our model mimics the stochastic location of the retina with Gaussian distribution. We evaluate the model on Large cluttered MNIST, Large CIFAR-10 and Large CIFAR-100 datasets which are resized to 128 in both width and height. The implementation of Gaussian RAM in PyTorch and its pretrained model are available at : https://github.com/dsshim0125/gaussian-ram

[1]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Qingshan Liu,et al.  Learning Multiscale Deep Features for High-Resolution Satellite Image Scene Classification , 2018, IEEE Transactions on Geoscience and Remote Sensing.

[3]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[4]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[5]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[6]  Jianfei Cai,et al.  Enriched Deep Recurrent Visual Attention Model for Multiple Object Recognition , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[7]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[8]  Qingshan Liu,et al.  Learning Multi-Scale Deep Features for High-Resolution Satellite Image Classification , 2016, ArXiv.

[9]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[10]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[13]  Alex Graves,et al.  Recurrent Models of Visual Attention , 2014, NIPS.

[14]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[15]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[16]  Koray Kavukcuoglu,et al.  Visual Attention , 2020, Computational Models for Cognitive Vision.

[17]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[18]  Hyuk-Jae Lee,et al.  Gaussian YOLOv3: An Accurate and Fast Object Detector Using Localization Uncertainty for Autonomous Driving , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).