Parameter Distribution Balanced CNNs

Convolutional neural network (CNN) is the primary technique that has greatly promoted the development of computer vision technologies. However, there is little research on how to allocate parameters in different convolution layers when designing CNNs. We research mainly on revealing the relationship between CNN parameter distribution, i.e., the allocation of parameters in convolution layers, and the discriminative performance of CNN. Unlike previous works, we do not append more elements into the network, such as more convolution layers or denser short connections. We focus on enhancing the discriminative performance of CNN through varying its parameter distribution under strict size constraint. We propose an energy function to represent the CNN parameter distribution, which establishes the connection between the allocation of parameters and the discriminative performance of CNN. Extensive experiments with shallow CNNs on three public image classification data sets demonstrate that the CNN parameter distribution with a higher energy value will promote the model to obtain better performance. According to the motivated observation, the problem of finding the optimal parameter distribution can be transformed into an optimization problem of finding the biggest energy value. We present a simple yet effective guideline that uses balanced parameter distribution to design CNNs. Extensive experiments on ImageNet with three popular backbones, i.e., AlexNet, ResNet34, and ResNet101, demonstrate that the proposed guideline can make consistent improvements upon different baselines under strict size constraint.

[1]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[2]  Bingbing Ni,et al.  HCP: A Flexible CNN Framework for Multi-Label Image Classification , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Ran El-Yaniv,et al.  Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations , 2016, J. Mach. Learn. Res..

[4]  Serge J. Belongie,et al.  Residual Networks Behave Like Ensembles of Relatively Shallow Networks , 2016, NIPS.

[5]  Anton van den Hengel,et al.  High-performance Semantic Segmentation Using Very Deep Fully Convolutional Networks , 2016, ArXiv.

[6]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[7]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[9]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[11]  Ohad Shamir,et al.  The Power of Depth for Feedforward Neural Networks , 2015, COLT.

[12]  Xiangyu Zhang,et al.  ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[14]  Zhenfeng Zhu,et al.  Indexing of the CNN features for the large scale image search , 2018, Multimedia Tools and Applications.

[15]  Gregory Shakhnarovich,et al.  FractalNet: Ultra-Deep Neural Networks without Residuals , 2016, ICLR.

[16]  Wenjun Zeng,et al.  Deeply-Fused Nets , 2016, ArXiv.

[17]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Yi Li,et al.  Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[19]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[20]  Yoshua Bengio,et al.  On the Expressive Power of Deep Architectures , 2011, ALT.

[21]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[22]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[23]  Joan Bruna,et al.  Mathematics of Deep Learning , 2017, ArXiv.

[24]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[25]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[26]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Jon Howell,et al.  Asirra: a CAPTCHA that exploits interest-aligned manual image categorization , 2007, CCS '07.

[28]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[30]  David Stutz,et al.  Neural Codes for Image Retrieval , 2015 .

[31]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Yueting Zhuang,et al.  Deep Convolutional Neural Networks with Merge-and-Run Mappings , 2016, IJCAI.

[33]  Junmo Kim,et al.  Deep Pyramidal Residual Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Song Han,et al.  EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[35]  Yao Zhao,et al.  Finding the Secret of CNN Parameter Layout under Strict Size Constraint , 2017, ACM Multimedia.

[36]  Johan Håstad,et al.  Almost optimal lower bounds for small depth circuits , 1986, STOC '86.

[37]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[38]  Jürgen Schmidhuber,et al.  Highway Networks , 2015, ArXiv.