Finding the Secret of CNN Parameter Layout under Strict Size Constraint

Although deep convolutional neural networks (CNNs) have significantly boosted the performance of many computer vision tasks, their complexities~(the size or the number of parameters) are also dramatically increased even with slight performance improvement. However, the larger network leads to more computation requirements, which are unfavorable to resource-constrained scenarios, such as the widely used embedded systems. In this paper, we tentatively explore the essential effect of CNN parameter layout, ıe, the allocation of parameters in the convolution layers, on the discriminative capability of CNN. Instead of enlarging the breadth or depth of networks, we attempt to improve the discriminative ability of CNN by changing its parameter layout under strict size constraint. Toward this end, a novel energy function is proposed to represent the CNN parameter layout, which makes it possible to model the relationship between the allocation of parameters in the convolution layers and the discriminative ability of CNN. According to extensive experimental results with plain CNN models and Residual Nets, we find that the higher the energy of a specific CNN parameter layout is, the better its discriminative ability is. Following this finding, we propose a novel approach to learn the better parameter layout. Experimental results on two public image classification datasets show that the CNN models with the learned parameter layouts achieve the better image classification results under strict size constraint.

[1]  Serge J. Belongie,et al.  Residual Networks Behave Like Ensembles of Relatively Shallow Networks , 2016, NIPS.

[2]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[3]  Johan Håstad,et al.  Almost optimal lower bounds for small depth circuits , 1986, STOC '86.

[4]  Zhuowen Tu,et al.  On the Connection of Deep Fusion to Ensembling , 2016, ArXiv.

[5]  Gregory Shakhnarovich,et al.  FractalNet: Ultra-Deep Neural Networks without Residuals , 2016, ICLR.

[6]  Jürgen Schmidhuber,et al.  Binding via Reconstruction Clustering , 2015, ArXiv.

[7]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[8]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[9]  Song Han,et al.  EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[10]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[11]  David Stutz,et al.  Neural Codes for Image Retrieval , 2015 .

[12]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[13]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[14]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Bingbing Ni,et al.  HCP: A Flexible CNN Framework for Multi-Label Image Classification , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[18]  Eugenio Culurciello,et al.  An Analysis of Deep Neural Network Models for Practical Applications , 2016, ArXiv.

[19]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[20]  Yi Li,et al.  Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21]  Ran El-Yaniv,et al.  Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations , 2016, J. Mach. Learn. Res..

[22]  Yao Zhao,et al.  Joint Optimization Toward Effective and Efficient Image Search , 2013, IEEE Transactions on Cybernetics.

[23]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Ohad Shamir,et al.  The Power of Depth for Feedforward Neural Networks , 2015, COLT.

[25]  Florent Perronnin,et al.  Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[27]  Yoshua Bengio,et al.  On the Expressive Power of Deep Architectures , 2011, ALT.

[28]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[29]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[30]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[31]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[32]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[33]  Yao Zhao,et al.  Multimodal Fusion for Video Search Reranking , 2010, IEEE Transactions on Knowledge and Data Engineering.

[34]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Zhenfeng Zhu,et al.  Indexing of the CNN features for the large scale image search , 2018, Multimedia Tools and Applications.

[36]  Yao Zhao,et al.  Frame Fusion for Video Copy Detection , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[37]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[38]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[40]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[41]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[42]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[43]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).