Quantisation and Pruning for Neural Network Compression and Regularisation

Deep neural networks are typically too computationally expensive to run in real-time on consumer-grade hardware and low-powered devices. In this paper, we investigate reducing the computational and memory requirements of neural networks through network pruning and quantisation. We examine their efficacy on large networks like AlexNet compared to recent compact architectures: ShuffleNet and MobileNet. Our results show that pruning and quantisation compresses these networks to less than half their original size and improves their efficiency, particularly on MobileNet with a 7x speedup. We also demonstrate that pruning, in addition to reducing the number of parameters in a network. can aid in the correction of overfitting.

[1]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[2]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3]  Max Welling,et al.  Learning Sparse Neural Networks through L0 Regularization , 2017, ICLR.

[4]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[5]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Andrew McCallum,et al.  Energy and Policy Considerations for Deep Learning in NLP , 2019, ACL.

[7]  Yixin Chen,et al.  Compressing Neural Networks with the Hashing Trick , 2015, ICML.

[8]  Raghuraman Krishnamoorthi,et al.  Quantizing deep convolutional networks for efficient inference: A whitepaper , 2018, ArXiv.

[9]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[10]  Babak Hassibi,et al.  Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[11]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[12]  Jianxin Wu,et al.  ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[13]  Xiangyu Zhang,et al.  ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  Xiangyu Zhang,et al.  ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design , 2018, ECCV.

[15]  Misha Denil,et al.  Predicting Parameters in Deep Learning , 2014 .

[16]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[17]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[18]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[19]  Yurong Liu,et al.  A survey of deep neural network architectures and their applications , 2017, Neurocomputing.

[20]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[21]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[22]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.