Learning Pooling for Convolutional Neural Network

Convolutional neural networks (CNNs) consist of alternating convolutional layers and pooling layers. The pooling layer is obtained by applying pooling operator to aggregate information within each small region of the input feature channels and then down sampling the results. Typically, hand-crafted pooling operations are used to aggregate information within a region, but they are not guaranteed to minimize the training error. To overcome this drawback, we propose a learned pooling operation obtained by end-to-end training which is called LEAP (LEArning Pooling). Specifically, in our method, one shared linear combination of the neurons in the region is learned for each feature channel (map). In fact, average pooling can be seen as one special case of our method where all the weights are equal. In addition, inspired by the LEAP operation, we propose one simplified convolution operation to replace the traditional convolution which consumes many extra parameters. The simplified convolution greatly reduces the number of parameters while maintaining comparable performance. By combining the proposed LEAP method and the simplified convolution, we demonstrate the state-of-the-art classification performance with moderate parameters on three public object recognition benchmarks: CIFAR10 dataset, CIFAR100 dataset, and ImageNet2012 dataset.

[1]  Rob Fergus,et al.  Stochastic Pooling for Regularization of Deep Convolutional Neural Networks , 2013, ICLR.

[2]  Lianwen Jin,et al.  Multi-font printed Chinese character recognition using multi-pooling convolutional neural network , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[3]  Chih-Hung Chang,et al.  Deep and Shallow Architecture of Multilayer Neural Networks , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[4]  Zhuowen Tu,et al.  Generalizing Pooling Functions in Convolutional Neural Networks: Mixed, Gated, and Tree , 2015, AISTATS.

[5]  Xuelong Li,et al.  Speed up deep neural network based pedestrian detection by sharing features across multi-scale models , 2016, Neurocomputing.

[6]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[7]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[8]  Andrea Vedaldi,et al.  MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.

[9]  Shiguang Shan,et al.  AU-inspired Deep Networks for Facial Expression Feature Learning , 2015, Neurocomputing.

[10]  Andrew Zisserman,et al.  Speeding up Convolutional Neural Networks with Low Rank Expansions , 2014, BMVC.

[11]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[13]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[14]  Heng Tao Shen,et al.  Order-aware Convolutional Pooling for Video Based Action Recognition , 2016, Pattern Recognit..

[15]  Yong-Sheng Chen,et al.  Batch-normalized Maxout Network in Network , 2015, ArXiv.

[16]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Jian Sun,et al.  Accelerating Very Deep Convolutional Networks for Classification and Detection , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Xuelong Li,et al.  Cascaded Subpatch Networks for Effective CNNs , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[20]  Xiaolin Hu,et al.  Recurrent convolutional neural network for object recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Fred Henrik Hamker,et al.  Predictions of a Model of Spatial Attention Using Sum- and Max-Pooling Functions , 2002, Neurocomputing.

[23]  Sven Behnke,et al.  Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition , 2010, ICANN.

[24]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[25]  Ling Shao,et al.  Learning Deep and Wide: A Spectral Method for Learning Deep Networks , 2014, IEEE Transactions on Neural Networks and Learning Systems.