Bi-branch deconvolution-based convolutional neural network for image classification

With the rise of deep neural network, convolutional neural networks show superior performances on many different computer vision recognition tasks. The convolution is used as one of the most efficient ways for extracting the details features of an image, while the deconvolution is mostly used for semantic segmentation and significance detection to obtain the contour information of the image and rarely used for image classification. In this paper, we propose a novel network named bi-branch deconvolution-based convolutional neural network (BB-deconvNet), which is constructed by mainly stacking a proposed simple module named Zoom. The Zoom module has two branches to extract multi-scale features from the same feature map. Especially, the deconvolution is borrowed to one of the branches, which can provide distinct features differently from regular convolution through the zoom of learned feature maps. To verify the effectiveness of the proposed network, we conduct several experiments on three object classification benchmarks (CIFAR-10, CIFAR-100, SVHN). The BB-deconvNet shows encouraging performances compared with other state-of-the-art deep CNNs.

[1]  Seunghoon Hong,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[2]  Graham W. Taylor,et al.  Adaptive deconvolutional networks for mid and high level feature learning , 2011, 2011 International Conference on Computer Vision.

[3]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[4]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[6]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[7]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[8]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.

[9]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[10]  Rob Fergus,et al.  Stochastic Pooling for Regularization of Deep Convolutional Neural Networks , 2013, ICLR.

[11]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Sergey Ioffe,et al.  Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models , 2017, NIPS.

[13]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[14]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[15]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[17]  Dima Damen,et al.  Recognizing linked events: Searching the space of feasible explanations , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[19]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[20]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[21]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[22]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Shuicheng Yan,et al.  Scale-Aware Fast R-CNN for Pedestrian Detection , 2015, IEEE Transactions on Multimedia.

[25]  Jürgen Schmidhuber,et al.  Training Very Deep Networks , 2015, NIPS.

[26]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[27]  Yann LeCun,et al.  Pedestrian Detection with Unsupervised Multi-stage Feature Learning , 2012, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[29]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[31]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[32]  Nikos Komodakis,et al.  DiracNets: Training Very Deep Neural Networks Without Skip-Connections , 2017, ArXiv.

[33]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[34]  Graham W. Taylor,et al.  Deconvolutional networks , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[35]  Yiran Chen,et al.  A Compact DNN: Approaching GoogLeNet-Level Accuracy of Classification and Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Yoshua Bengio,et al.  FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[37]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[38]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[39]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[40]  Zhuowen Tu,et al.  Deeply-Supervised Nets , 2014, AISTATS.