Dynamic Hard Pruning of Neural Networks at the Edge of the Internet

Neural Networks (NN), although successfully applied to several Artificial Intelligence tasks, are often unnecessarily over-parametrized. In fog/edge computing, this might make their training prohibitive on resource-constrained devices, contrasting with the current trend of decentralising intelligence from remote data-centres to local constrained devices. Therefore, we investigate the problem of training effective NN models on constrained devices having a fixed, potentially small, memory budget. We target techniques that are both resource-efficient and performance effective while enabling significant network compression. Our technique, called Dynamic Hard Pruning (DynHP), incrementally prunes the network during training, identifying neurons that marginally contribute to the model accuracy. DynHP enables a tunable size reduction of the final neural network and reduces the NN memory occupancy during training. Freed memory is reused by a \emph{dynamic batch sizing} approach to counterbalance the accuracy degradation caused by the hard pruning strategy, improving its convergence and effectiveness. We assess the performance of DynHP through reproducible experiments on two public datasets, comparing them against reference competitors. Results show that DynHP compresses a NN up to $10$ times without significant performance drops (up to $5\%$ relative error w.r.t. competitors), reducing up to $80\%$ the training memory occupancy.

[1]  Hamid Sarbazi-Azad,et al.  Chapter Six - Topology Specialization for Networks-on-Chip in the Dark Silicon Era , 2018, Adv. Comput..

[2]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[3]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[4]  Max Welling,et al.  Variational Dropout and the Local Reparameterization Trick , 2015, NIPS 2015.

[5]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[6]  Yoshua Bengio,et al.  BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[7]  Dmitry P. Vetrov,et al.  Structured Bayesian Pruning via Log-Normal Multiplicative Noise , 2017, NIPS.

[8]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[9]  R. Venkatesh Babu,et al.  Data-free Parameter Pruning for Deep Neural Networks , 2015, BMVC.

[10]  Javier Romero,et al.  Coupling Adaptive Batch Sizes with Learning Rates , 2016, UAI.

[11]  Yoshua Bengio,et al.  Training deep neural networks with low precision multiplications , 2014 .

[12]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[13]  Dmitry P. Vetrov,et al.  Variational Dropout Sparsifies Deep Neural Networks , 2017, ICML.

[14]  R. Venkatesh Babu,et al.  Training Sparse Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[15]  Wei Wang,et al.  Edge computing: the case for heterogeneous-ISA container migration , 2020, VEE.

[16]  Yurong Chen,et al.  Dynamic Network Surgery for Efficient DNNs , 2016, NIPS.

[17]  David Kappel,et al.  Deep Rewiring: Training very sparse deep networks , 2017, ICLR.

[18]  Erich Elsen,et al.  Exploring Sparsity in Recurrent Neural Networks , 2017, ICLR.

[19]  Shuicheng Yan,et al.  Training Skinny Deep Neural Networks with Iterative Hard Thresholding Methods , 2016, ArXiv.

[20]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[21]  Martin Jaggi,et al.  Dynamic Model Pruning with Feedback , 2020, ICLR.

[22]  Timo Aila,et al.  Pruning Convolutional Neural Networks for Resource Efficient Inference , 2016, ICLR.

[23]  Quoc V. Le,et al.  Don't Decay the Learning Rate, Increase the Batch Size , 2017, ICLR.

[24]  Max Welling,et al.  Soft Weight-Sharing for Neural Network Compression , 2017, ICLR.

[25]  Song Han,et al.  DSD: Dense-Sparse-Dense Training for Deep Neural Networks , 2016, ICLR.

[26]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[27]  Rich Caruana,et al.  Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[28]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, ArXiv.

[29]  Sajal K. Das,et al.  The Internet of People (IoP): A new wave in pervasive mobile computing , 2017, Pervasive Mob. Comput..

[30]  Teruo Higashino,et al.  Edge-centric Computing: Vision and Challenges , 2015, CCRV.