Scalenet: A Convolutional Network to Extract Multi-Scale and Fine-Grained Visual Features

Many convolutional neural networks have been proposed for image classification in recent years. Most tend to decrease the plane size of feature maps stage-by-stage, such that the feature maps generated within each stage show the same plane size. This concept governs the design of most classification networks. However, it can also lead to semantic deficiency of high-resolution feature maps as they are always placed in the shallow layers of a network. Here, we propose a novel network architecture, named ScaleNet, which consists of stacked convolution-deconvolution blocks and a multipath residual structure. Unlike most current networks, ScaleNet extracts image features by a cascaded deconstruction-reconstruction process. It can generate scale-variable feature maps within each block and stage, thereby realizing multiscale feature extraction at any depth of the network. Based on the CIFAR-10, CIFAR-100, and ImageNet datasets, ScaleNet demonstrated competitive classification performance compared to state-of-the-art ResNet. In addition, ScaleNet exhibited a powerful ability to capture strong semantic and fine-grained features on its high-resolution feature maps. The code is available at https://github.com/zhjpqq/scalenet.

[1]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[2]  Martial Hebert,et al.  Log-DenseNet: How to Sparsify a DenseNet , 2017, ArXiv.

[3]  Li Zhang,et al.  Spatially Adaptive Computation Time for Residual Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Kilian Q. Weinberger,et al.  Multi-Scale Dense Networks for Resource Efficient Image Classification , 2017, ICLR.

[5]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[6]  Alok Aggarwal,et al.  Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.

[7]  Zhuowen Tu,et al.  Deeply-Supervised Nets , 2014, AISTATS.

[8]  Francesco Visin,et al.  A guide to convolution arithmetic for deep learning , 2016, ArXiv.

[9]  Wenqi Liu,et al.  SparseNet: A Sparse DenseNet for Image Classification , 2018, ArXiv.

[10]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[11]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[12]  Vincent Dumoulin,et al.  Deconvolution and Checkerboard Artifacts , 2016 .

[13]  Jürgen Schmidhuber,et al.  Highway Networks , 2015, ArXiv.

[14]  Song Han,et al.  ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware , 2018, ICLR.

[15]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[16]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[17]  Xiaogang Wang,et al.  Residual Attention Network for Image Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Jiri Matas,et al.  All you need is a good init , 2015, ICLR.

[19]  Ke Zhang,et al.  Residual Networks of Residual Networks: Multilevel Residual Networks , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[20]  Brahim Chaib-draa,et al.  Parametric Exponential Linear Unit for Deep Convolutional Neural Networks , 2016, 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA).

[21]  Diogo Almeida,et al.  Resnet in Resnet: Generalizing Residual Architectures , 2016, ArXiv.

[22]  Daniel L. K. Yamins,et al.  Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition , 2014, PLoS Comput. Biol..

[23]  Li Fei-Fei,et al.  Progressive Neural Architecture Search , 2017, ECCV.

[24]  한보형,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015 .

[25]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[26]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[27]  Silong Peng,et al.  Rectified Exponential Units for Convolutional Neural Networks , 2019, IEEE Access.

[28]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[29]  Quoc V. Le,et al.  Searching for Activation Functions , 2018, arXiv.

[30]  Jonathan Gordon,et al.  Probabilistic Neural Architecture Search , 2019, ArXiv.

[31]  Anish Shah,et al.  Deep Residual Networks with Exponential Linear Unit , 2016, ArXiv.

[32]  Kilian Q. Weinberger,et al.  CondenseNet: An Efficient DenseNet Using Learned Group Convolutions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33]  Liang Lin,et al.  SNAS: Stochastic Neural Architecture Search , 2018, ICLR.

[34]  Yanyan Shen,et al.  Refine or Represent: Residual Networks with Explicit Channel-wise Configuration , 2018, IJCAI.

[35]  Gregory Shakhnarovich,et al.  FractalNet: Ultra-Deep Neural Networks without Residuals , 2016, ICLR.

[36]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Gan Rui,et al.  Weighted residuals for very deep networks , 2016 .

[38]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Shuai Yi,et al.  FishNet: A Versatile Backbone for Image, Region, and Pixel Level Prediction , 2019, NeurIPS.

[40]  Wei Liu,et al.  DSSD : Deconvolutional Single Shot Detector , 2017, ArXiv.

[41]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[42]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[46]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Kilian Q. Weinberger,et al.  Deep Networks with Stochastic Depth , 2016, ECCV.

[48]  Yu-Wing Tai,et al.  Accurate Single Stage Detector Using Recurrent Rolling Convolution , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Christian Ledig,et al.  Is the deconvolution layer the same as a convolutional layer? , 2016, ArXiv.

[50]  Yi Li,et al.  Data-Driven Neuron Allocation for Scale Aggregation Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Bo Chen,et al.  MnasNet: Platform-Aware Neural Architecture Search for Mobile , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Kai Zhao,et al.  Res2Net: A New Multi-Scale Backbone Architecture , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[54]  Ke Zhang,et al.  Multiple Feature Reweight DenseNet for Image Classification , 2019, IEEE Access.

[55]  Christopher Joseph Pal,et al.  Convolutional Residual Memory Networks , 2016, ArXiv.

[56]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Frank Hutter,et al.  Neural Architecture Search: A Survey , 2018, J. Mach. Learn. Res..

[58]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[59]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[60]  Jun Zhou,et al.  Multi-Scale Convolution Aggregation and Stochastic Feature Reuse for DenseNets , 2019, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[61]  Thomas A. Funkhouser,et al.  Dilated Residual Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).