论文信息 - UP-CNN: Un-pooling augmented convolutional neural network

UP-CNN: Un-pooling augmented convolutional neural network

Abstract Convolutional neural network (CNN) has shown remarkable performance in various visual recognition tasks. Most of existing CNN is a purely bottom-up and feed-forward architecture, we argue that it fails to consider the interaction between low-level fine details and high-level semantic information. In this paper, a novel “Un-Pooling augmented Convolutional Neural Network” (UP-CNN) is proposed to boost the discriminative capability of the CNN with the following three distinctive properties: (1) UP-CNN is a deeper network, which is comprised of a bottom-up, a top-down and then a bottom-up sub-networks, associating to different level information that jointly improves its discriminative capability. (2) With the mixture of pooling and un-pooling layers, UP-CNN easily allows the interaction across convolutional layers with the same size from different sub-networks. This architecture effectively depresses the attenuation of important information including both activations in the forward process and gradient information in the back-propagation process. (3) UP-CNN employs the ratio un-pooling operation to reconstruct activations of the original size in the top-down sub-network, where the spatial information that is lost during pooling can be preserved within a receptive field. The experiments on four benchmark datasets (including the CIFAR-10, CIFAR-100, MNIST and SVHN datasets) well demonstrate that the proposed UP-CNN architecture considerably outperforms other state-of-the-art methods.

[1] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[3] Yi Yang,et al. DevNet: A Deep Event Network for multimedia event detection and evidence recounting , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Wenyu Liu,et al. Multi-oriented Text Detection with Fully Convolutional Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Yann LeCun,et al. Regularization of Neural Networks using DropConnect , 2013, ICML.

[6] Anil K. Jain,et al. Artificial Neural Networks: A Tutorial , 1996, Computer.

[7] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[8] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[9] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[10] Zhuowen Tu,et al. Deeply-Supervised Nets , 2014, AISTATS.

[11] Tai Sing Lee,et al. Hierarchical Bayesian inference in the visual cortex. , 2003, Journal of the Optical Society of America. A, Optics, image science, and vision.

[12] Qiang Chen,et al. Network In Network , 2013, ICLR.

[13] Yoshua Bengio,et al. Maxout Networks , 2013, ICML.

[14] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Shuicheng Yan,et al. Human Parsing with Contextualized Convolutional Neural Network , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16] Yann LeCun,et al. Stacked What-Where Auto-encoders , 2015, ArXiv.

[17] Jia Deng,et al. Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[18] Yi Yang,et al. A discriminative CNN video representation for event detection , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19] Peter Dayan,et al. Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems , 2001 .

[20] Changsheng Xu,et al. Matching-CNN meets KNN: Quasi-parametric human parsing , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Martin A. Riedmiller,et al. Improving Deep Neural Networks with Probabilistic Maxout Units , 2013, ICLR.

[22] Seunghoon Hong,et al. Learning Deconvolution Network for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[23] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[24] D. Hubel,et al. Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[25] Zheng Liu,et al. Integrated Imaging and Vision Techniques for Industrial Inspection: Advances and Applications , 2015 .

[26] Graham W. Taylor,et al. Adaptive deconvolutional networks for mid and high level feature learning , 2011, 2011 International Conference on Computer Vision.

[27] Xiaolin Hu,et al. Recurrent convolutional neural network for object recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).