Convolutional Residual Memory Networks

Very deep convolutional neural networks (CNNs) yield state of the art results on a wide variety of visual recognition problems. A number of state of the the art methods for image recognition are based on networks with well over 100 layers and the performance vs. depth trend is moving towards networks in excess of 1000 layers. In such extremely deep architectures the vanishing or exploding gradient problem becomes a key issue. Recent evidence also indicates that convolutional networks could benefit from an interface to explicitly constructed memory mechanisms interacting with a CNN feature processing hierarchy. Correspondingly, we propose and evaluate a memory mechanism enhanced convolutional neural network architecture based on augmenting convolutional residual networks with a long short term memory mechanism. We refer to this as a convolutional residual memory network. To the best of our knowledge this approach can yield state of the art performance on the CIFAR-100 benchmark and compares well with other state of the art techniques on the CIFAR-10 and SVHN benchmarks. This is achieved using networks with more breadth, much less depth and much less overall computation relative to comparable deep ResNets without the memory mechanism. Our experiments and analysis explore the importance of the memory mechanism, network depth, breadth, and predictive performance.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Jiri Matas,et al.  All you need is a good init , 2015, ICLR.

[3]  Jürgen Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[4]  Kilian Q. Weinberger,et al.  Deep Networks with Stochastic Depth , 2016, ECCV.

[5]  Surya Ganguli,et al.  Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.

[6]  Yaroslav Bulatov,et al.  Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks , 2013, ICLR.

[7]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[8]  Jürgen Schmidhuber,et al.  Training Very Deep Networks , 2015, NIPS.

[9]  David A. Forsyth,et al.  Swapout: Learning an ensemble of deep architectures , 2016, NIPS.

[10]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[11]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[12]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[13]  Zhuowen Tu,et al.  Generalizing Pooling Functions in Convolutional Neural Networks: Mixed, Gated, and Tree , 2015, AISTATS.

[14]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[15]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.

[17]  Nitish Srivastava,et al.  Unsupervised Learning of Video Representations using LSTMs , 2015, ICML.

[18]  Benjamin Graham,et al.  Fractional Max-Pooling , 2014, ArXiv.

[19]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[20]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[21]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[22]  Rob Fergus,et al.  Stochastic Pooling for Regularization of Deep Convolutional Neural Networks , 2013, ICLR.

[23]  Prabhat,et al.  Scalable Bayesian Optimization Using Deep Neural Networks , 2015, ICML.

[24]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[25]  Aaron C. Courville,et al.  Recurrent Batch Normalization , 2016, ICLR.

[26]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[27]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[28]  Christopher Joseph Pal,et al.  Delving Deeper into Convolutional Networks for Learning Video Representations , 2015, ICLR.

[29]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[32]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.