Effects of hidden layer sizing on CNN fine-tuning

Abstract Some applications have the property of being resilient, meaning that they are robust to noise (e.g. due to error) in the data. This characteristic is very useful in situations where an approximate computation allows to perform the task in less time or to deploy the algorithm on embedded hardware. Deep learning is one of the fields that can benefit from approximate computing to reduce the high number of involved parameters thanks to its impressive generalisation ability. A common approach is to prune some neurons and perform an iterative re-training with the aim of both reducing the required memory and to speed-up the inference stage. In this work we propose to face CNN size reduction from a different perspective: instead of reducing the network weights or look for an approximated network very close to the Pareto frontier, we investigate whether it is possible to remove some neurons only from the fully connected layers before the network training without substantially affecting the network performance. As a case study, we will focus on “fine-tuning”, a branch of transfer learning that has shown its effectiveness especially in domains lacking effective expert-designed features. To further compact the network, we apply weight quantization to the convolutional kernels. Results show that it is possible to tailor some layers to reduce the network size, both in terms of the number of parameters to learn and required memory, without statistically affecting the performance and without the need for any additional training. Finally, we investigate to what extent the sizing operation affects the network robustness against adversarial perturbations, a set of approaches aimed at misleading deep neural networks.

[1]  Stefano Marrone,et al.  Reproducibility of Deep CNN for Biomedical Image Processing Across Frameworks and Architectures , 2019, 2019 27th European Signal Processing Conference (EUSIPCO).

[2]  Fei-Fei Li,et al.  What, where and who? Classifying events by scene and object recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[3]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[4]  David A. Wagner,et al.  Audio Adversarial Examples: Targeted Attacks on Speech-to-Text , 2018, 2018 IEEE Security and Privacy Workshops (SPW).

[5]  Flora Amato,et al.  Outperforming Image Segmentation by Exploiting Approximate K-Means Algorithms , 2017 .

[6]  Evangelos Vassalos,et al.  Dynamic Pruning of CNN networks , 2019, 2019 10th International Conference on Information, Intelligence, Systems and Applications (IISA).

[7]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[8]  K. Gnana Sheela,et al.  Review on Methods to Fix Number of Hidden Neurons in Neural Networks , 2013 .

[9]  Suman Ahmmed,et al.  Architecture and Weight Optimization of ANN Using Sensitive Analysis and Adaptive Particle Swarm Optimization , 2010 .

[10]  Quoc V. Le,et al.  Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis , 2011, CVPR 2011.

[11]  Guang-Bin Huang,et al.  Learning capability and storage capacity of two-hidden-layer feedforward networks , 2003, IEEE Trans. Neural Networks.

[12]  F. L. Xiong,et al.  A method for estimating the number of hidden neurons in feed-forward neural networks based on information entropy , 2003 .

[13]  Yanjun Qi,et al.  Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks , 2017, NDSS.

[14]  Shuxiang Xu,et al.  A novel approach for determining the optimal number of hidden layer neurons for FNN’s and its application in data mining , 2008 .

[15]  Cagdas Hakan Aladag,et al.  A new architecture selection method based on tabu search for artificial neural networks , 2011, Expert Syst. Appl..

[16]  Hiroyuki Shindo,et al.  Interpretable Adversarial Perturbation in Input Embedding Space for Text , 2018, IJCAI.

[17]  David G. Stork,et al.  Pattern Classification , 1973 .

[18]  Carlo Sansone,et al.  Approximate Decision Tree-Based Multiple Classifier Systems , 2017 .

[19]  Wonyong Sung,et al.  Structured Pruning of Deep Convolutional Neural Networks , 2015, ACM J. Emerg. Technol. Comput. Syst..

[20]  Jongmin Park,et al.  Memory-Reduced Network Stacking for Edge-Level CNN Architecture With Structured Weight Pruning , 2019, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[21]  Shin Ishii,et al.  Distributional Smoothing with Virtual Adversarial Training , 2015, ICLR 2016.

[22]  Timo Aila,et al.  Pruning Convolutional Neural Networks for Resource Efficient Inference , 2016, ICLR.

[23]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[24]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[25]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[26]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[27]  Wonyong Sung,et al.  Resiliency of Deep Neural Networks under Quantization , 2015, ArXiv.

[28]  Jon Howell,et al.  Asirra: a CAPTCHA that exploits interest-aligned manual image categorization , 2007, CCS '07.

[29]  Zoubin Ghahramani,et al.  A study of the effect of JPG compression on adversarial images , 2016, ArXiv.

[30]  Gaurang Panchal,et al.  Behaviour Analysis of Multilayer Perceptrons with Multiple Hidden Neurons and Hidden Layers , 2011 .

[31]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[32]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[33]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[34]  Guang-Bin Huang,et al.  Neuron selection for RBF neural network classifier based on data structure preserving criterion , 2005, IEEE Transactions on Neural Networks.

[35]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[36]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[37]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[38]  Gu-Yeon Wei,et al.  Benchmarking TPU, GPU, and CPU Platforms for Deep Learning , 2019, ArXiv.

[39]  Erol Egrioglu,et al.  Improving weighted information criterion by using optimization , 2010, J. Comput. Appl. Math..

[40]  Hao Yu,et al.  Selection of Proper Neural Network Sizes and Architectures—A Comparative Study , 2012, IEEE Transactions on Industrial Informatics.

[41]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[42]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.