Model Compression and Acceleration for Deep Neural Networks: The Principles, Progress, and Challenges

In recent years, deep neural networks (DNNs) have received increased attention, have been applied to different applications, and achieved dramatic accuracy improvements in many tasks. These works rely on deep networks with millions or even billions of parameters, and the availability of graphics processing units (GPUs) with very high computation capability plays a key role in their success. For example, Krizhevsky et al. [1] achieved breakthrough results in the 2012 ImageNet Challenge using a network containing 60 million parameters with five convolutional layers and three fully connected layers. Usually, it takes two to three days to train the whole model on the ImagetNet data set with an NVIDIA K40 machine. In another example, the top face-verification results from the Labeled Faces in the Wild (LFW) data set were obtained with networks containing hundreds of millions of parameters, using a mix of convolutional, locally connected, and fully connected layers [2], [3]. It is also very time-consuming to train such a model to obtain a reasonable performance. In architectures that only rely on fully connected layers, the number of parameters can grow to billions [4].

[1]  Lorien Y. Pratt,et al.  Comparing Biases for Minimal Network Construction with Back-Propagation , 1988, NIPS.

[2]  David S. Touretzky,et al.  Advances in neural information processing systems 2 , 1989 .

[3]  Babak Hassibi,et al.  Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[4]  T. Kailath,et al.  Generalized Displacement Structure for Block-Toeplitz,Toeplitz-Block, and Toeplitz-Derived Matrices , 1994 .

[5]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[6]  Rich Caruana,et al.  Model compression , 2006, KDD '06.

[7]  Vincent Vanhoucke,et al.  Improving the speed of neural networks on CPUs , 2011 .

[8]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[9]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[10]  Ebru Arisoy,et al.  Low-rank matrix factorization for Deep Neural Network training with high-dimensional output targets , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Vincent Lepetit,et al.  Learning Separable Filters , 2013, CVPR.

[12]  Misha Denil,et al.  Predicting Parameters in Deep Learning , 2014 .

[13]  Andrew Zisserman,et al.  Speeding up Convolutional Neural Networks with Low Rank Expansions , 2014, BMVC.

[14]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[15]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Yann LeCun,et al.  Fast Training of Convolutional Networks through FFTs , 2013, ICLR.

[17]  Rich Caruana,et al.  Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[18]  Joan Bruna,et al.  Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation , 2014, NIPS.

[19]  Ming Yang,et al.  Compressing Deep Convolutional Networks using Vector Quantization , 2014, ArXiv.

[20]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[21]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[22]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[23]  Le Song,et al.  Deep Fried Convnets , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[24]  Shih-Fu Chang,et al.  An Exploration of Parameter Redundancy in Deep Networks with Circulant Projections , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[25]  Ivan V. Oseledets,et al.  Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition , 2014, ICLR.

[26]  Vivek Rathod,et al.  Bayesian dark knowledge , 2015, NIPS.

[27]  Yoshua Bengio,et al.  BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[28]  R. Venkatesh Babu,et al.  Data-free Parameter Pruning for Deep Neural Networks , 2015, BMVC.

[29]  Pritish Narayanan,et al.  Deep Learning with Limited Numerical Precision , 2015, ICML.

[30]  Tara N. Sainath,et al.  Structured Transforms for Small-Footprint Deep Learning , 2015, NIPS.

[31]  Masashi Sugiyama,et al.  Bayesian Dark Knowledge , 2015 .

[32]  Yoshua Bengio,et al.  FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[33]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Xiaogang Wang,et al.  Deeply learned face representations are sparse, selective, and robust , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[36]  Yixin Chen,et al.  Compressing Neural Networks with the Hashing Trick , 2015, ICML.

[37]  Ivan V. Oseledets,et al.  Fast Multidimensional Convolution in Low-Rank Tensor Formats via Cross Approximation , 2015, SIAM J. Sci. Comput..

[38]  Mathieu Salzmann,et al.  Learning the Number of Neurons in Deep Networks , 2016, NIPS.

[39]  Honglak Lee,et al.  Understanding and Improving Convolutional Neural Networks via Concatenated Rectified Linear Units , 2016, ICML.

[40]  Yoshua Bengio,et al.  BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1 , 2016, ArXiv.

[41]  Tianqi Chen,et al.  Net2Net: Accelerating Learning via Knowledge Transfer , 2015, ICLR.

[42]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Marcin Andrychowicz,et al.  Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[44]  Hao Zhou,et al.  Less Is More: Towards Compact CNNs , 2016, ECCV.

[45]  Xiaogang Wang,et al.  Face Model Compression by Distilling Knowledge from Neurons , 2016, AAAI.

[46]  Dharmendra S. Modha,et al.  Deep neural networks are robust to weight binarization and other non-linear distortions , 2016, ArXiv.

[47]  Eunhyeok Park,et al.  Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications , 2015, ICLR.

[48]  Kilian Q. Weinberger,et al.  Deep Networks with Stochastic Depth , 2016, ECCV.

[49]  Misha Denil,et al.  ACDC: A Structured Efficient Linear Layer , 2015, ICLR.

[50]  Hugo Larochelle,et al.  Dynamic Capacity Networks , 2015, ICML.

[51]  Max Welling,et al.  Group Equivariant Convolutional Networks , 2016, ICML.

[52]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[53]  Yoshua Bengio,et al.  Neural Networks with Few Multiplications , 2015, ICLR.

[54]  Yiran Chen,et al.  Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.

[55]  Masakazu Iwamura,et al.  Deep Pyramidal Residual Networks with Separated Stochastic Depth , 2016, ArXiv.

[56]  Victor S. Lempitsky,et al.  Fast ConvNets Using Group-Wise Brain Damage , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Koray Kavukcuoglu,et al.  Exploiting Cyclic Symmetry in Convolutional Neural Networks , 2016, ICML.

[58]  Ling Shao,et al.  Deep Dynamic Neural Networks for Multimodal Gesture Segmentation and Recognition , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[59]  Zhongfei Zhang,et al.  Doubly Convolutional Neural Networks , 2016, NIPS.

[60]  Jian Cheng,et al.  Quantized Convolutional Neural Networks for Mobile Devices , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  Xiaogang Wang,et al.  Convolutional neural networks with low-rank regularization , 2015, ICLR.

[62]  Andrew Lavin,et al.  Fast Algorithms for Convolutional Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[63]  Igor Carron,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016 .

[64]  Xiaogang Wang,et al.  Multi-Bias Non-linear Activation in Deep Neural Networks , 2016, ICML.

[65]  Yu Cheng,et al.  Fully-Adaptive Feature Sharing in Multi-Task Networks with Applications in Person Attribute Classification , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[66]  James T. Kwok,et al.  Loss-aware Binarization of Deep Networks , 2016, ICLR.

[67]  Geoffrey E. Hinton,et al.  Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , 2017, ICLR.

[68]  Quoc V. Le,et al.  HyperNetworks , 2016, ICLR.

[69]  Max Welling,et al.  Soft Weight-Sharing for Neural Network Compression , 2017, ICLR.

[70]  Jungwon Lee,et al.  Towards the Limit of Network Quantization , 2016, ICLR.

[71]  Forrest N. Iandola,et al.  SqueezeDet: Unified, Small, Low Power Fully Convolutional Neural Networks for Real-Time Object Detection for Autonomous Driving , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[72]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[73]  Hanan Samet,et al.  Pruning Filters for Efficient ConvNets , 2016, ICLR.

[74]  Song Han,et al.  Trained Ternary Quantization , 2016, ICLR.

[75]  Yu Cheng,et al.  S3Pool: Pooling with Stochastic Spatial Sampling , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[76]  Dacheng Tao,et al.  Beyond Filters: Compact Feature Map for Portable Deep Model , 2017, ICML.

[77]  Nikos Komodakis,et al.  Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer , 2016, ICLR.