UP-CNN: Un-pooling augmented convolutional neural network

Abstract Convolutional neural network (CNN) has shown remarkable performance in various visual recognition tasks. Most of existing CNN is a purely bottom-up and feed-forward architecture, we argue that it fails to consider the interaction between low-level fine details and high-level semantic information. In this paper, a novel “Un-Pooling augmented Convolutional Neural Network” (UP-CNN) is proposed to boost the discriminative capability of the CNN with the following three distinctive properties: (1) UP-CNN is a deeper network, which is comprised of a bottom-up, a top-down and then a bottom-up sub-networks, associating to different level information that jointly improves its discriminative capability. (2) With the mixture of pooling and un-pooling layers, UP-CNN easily allows the interaction across convolutional layers with the same size from different sub-networks. This architecture effectively depresses the attenuation of important information including both activations in the forward process and gradient information in the back-propagation process. (3) UP-CNN employs the ratio un-pooling operation to reconstruct activations of the original size in the top-down sub-network, where the spatial information that is lost during pooling can be preserved within a receptive field. The experiments on four benchmark datasets (including the CIFAR-10, CIFAR-100, MNIST and SVHN datasets) well demonstrate that the proposed UP-CNN architecture considerably outperforms other state-of-the-art methods.

[1]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[3]  Yi Yang,et al.  DevNet: A Deep Event Network for multimedia event detection and evidence recounting , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Wenyu Liu,et al.  Multi-oriented Text Detection with Fully Convolutional Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Yann LeCun,et al.  Regularization of Neural Networks using DropConnect , 2013, ICML.

[6]  Anil K. Jain,et al.  Artificial Neural Networks: A Tutorial , 1996, Computer.

[7]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[8]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[9]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[10]  Zhuowen Tu,et al.  Deeply-Supervised Nets , 2014, AISTATS.

[11]  Tai Sing Lee,et al.  Hierarchical Bayesian inference in the visual cortex. , 2003, Journal of the Optical Society of America. A, Optics, image science, and vision.

[12]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[13]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.

[14]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Shuicheng Yan,et al.  Human Parsing with Contextualized Convolutional Neural Network , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Yann LeCun,et al.  Stacked What-Where Auto-encoders , 2015, ArXiv.

[17]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[18]  Yi Yang,et al.  A discriminative CNN video representation for event detection , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Peter Dayan,et al.  Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems , 2001 .

[20]  Changsheng Xu,et al.  Matching-CNN meets KNN: Quasi-parametric human parsing , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Martin A. Riedmiller,et al.  Improving Deep Neural Networks with Probabilistic Maxout Units , 2013, ICLR.

[22]  Seunghoon Hong,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[23]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[24]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[25]  Zheng Liu,et al.  Integrated Imaging and Vision Techniques for Industrial Inspection: Advances and Applications , 2015 .

[26]  Graham W. Taylor,et al.  Adaptive deconvolutional networks for mid and high level feature learning , 2011, 2011 International Conference on Computer Vision.

[27]  Xiaolin Hu,et al.  Recurrent convolutional neural network for object recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).