A comprehensive survey on model compression and acceleration

In recent years, machine learning (ML) and deep learning (DL) have shown remarkable improvement in computer vision, natural language processing, stock prediction, forecasting, and audio processing to name a few. The size of the trained DL model is large for these complex tasks, which makes it difficult to deploy on resource-constrained devices. For instance, size of the pre-trained VGG16 model trained on the ImageNet dataset is more than 500 MB. Resource-constrained devices such as mobile phones and internet of things devices have limited memory and less computation power. For real-time applications, the trained models should be deployed on resource-constrained devices. Popular convolutional neural network models have millions of parameters that leads to increase in the size of the trained model. Hence, it becomes essential to compress and accelerate these models before deploying on resource-constrained devices while making the least compromise with the model accuracy. It is a challenging task to retain the same accuracy after compressing the model. To address this challenge, in the last couple of years many researchers have suggested different techniques for model compression and acceleration. In this paper, we have presented a survey of various techniques suggested for compressing and accelerating the ML and DL models. We have also discussed the challenges of the existing techniques and have provided future research directions in the field.

[1]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[2]  Wonyong Sung,et al.  Fixed-point performance analysis of recurrent neural networks , 2016, ICASSP.

[3]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[5]  Ian McGraw,et al.  On the compression of recurrent neural networks with an application to LVCSR acoustic modeling for embedded speech recognition , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Mathieu Salzmann,et al.  Compression-aware Training of Deep Networks , 2017, NIPS.

[7]  Hassan Foroosh,et al.  Sparse Convolutional Neural Networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Chong Li,et al.  Constrained Optimization Based Low-Rank Approximation of Deep Neural Networks , 2018, ECCV.

[9]  Joan Bruna,et al.  Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation , 2014, NIPS.

[10]  Le Song,et al.  Deep Fried Convnets , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[11]  Alex Krizhevsky,et al.  One weird trick for parallelizing convolutional neural networks , 2014, ArXiv.

[12]  Jian-Jhih Kuo,et al.  Distributed Deep Neural Network Deployment for Smart Devices from the Edge to the Cloud , 2019, PERSIST-IoT '19.

[13]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[14]  Tara N. Sainath,et al.  Learning compact recurrent neural networks , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Elad Eban,et al.  MorphNet: Fast & Simple Resource-Constrained Structure Learning of Deep Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Jia-Ching Wang,et al.  Transportation Mode Detection on Mobile Devices Using Recurrent Nets , 2016, ACM Multimedia.

[17]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[18]  Christopher Leckie,et al.  An efficient hyperellipsoidal clustering algorithm for resource-constrained environments , 2011, Pattern Recognit..

[19]  Hongbin Zha,et al.  Alternating Multi-bit Quantization for Recurrent Neural Networks , 2018, ICLR.

[20]  Timo Aila,et al.  Pruning Convolutional Neural Networks for Resource Efficient Inference , 2016, ICLR.

[21]  Venkatesh Saligrama,et al.  Pruning Random Forests for Prediction on a Budget , 2016, NIPS.

[22]  Jian Cheng,et al.  Quantized CNN: A Unified Approach to Accelerate and Compress Convolutional Networks , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[23]  Jie Zhang,et al.  Dynamically Hierarchy Revolution: DirNet for Compressing Recurrent Neural Network on Mobile Devices , 2018, IJCAI.

[24]  Prateek Jain,et al.  ProtoNN: Compressed and Accurate kNN for Resource-scarce Devices , 2017, ICML.

[25]  Michael L. Littman,et al.  Reinforcement learning improves behaviour from evaluative feedback , 2015, Nature.

[26]  Forrest N. Iandola,et al.  SqueezeDet: Unified, Small, Low Power Fully Convolutional Neural Networks for Real-Time Object Detection for Autonomous Driving , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[27]  Paris Smaragdis,et al.  A Simple yet Effective Method to Prune Dense Layers of Neural Networks , 2017 .

[28]  Ron Meir,et al.  Expectation Backpropagation: Parameter-Free Training of Multilayer Neural Networks with Continuous or Discrete Weights , 2014, NIPS.

[29]  James T. Kwok,et al.  Loss-aware Binarization of Deep Networks , 2016, ICLR.

[30]  Linda G. Shapiro,et al.  ESPNetv2: A Light-Weight, Power Efficient, and General Purpose Convolutional Neural Network , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Chao Wang,et al.  Compression of Acoustic Event Detection Models with Low-rank Matrix Factorization and Quantization Training , 2018, ArXiv.

[32]  Vishnu Naresh Boddeti,et al.  In Teacher We Trust: Learning Compressed Models for Pedestrian Detection , 2016, ArXiv.

[33]  Hui Liu,et al.  On-Demand Deep Model Compression for Mobile Devices: A Usage-Driven Model Selection Framework , 2018, MobiSys.

[34]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[35]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[36]  Yoshua Bengio,et al.  Training deep neural networks with low precision multiplications , 2014 .

[37]  Pierre Geurts,et al.  L1-based compression of random forest models , 2012, ESANN.

[38]  Misha Denil,et al.  Predicting Parameters in Deep Learning , 2014 .

[39]  Vishnu Naresh Boddeti,et al.  Local Binary Convolutional Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[41]  Song Han,et al.  AMC: AutoML for Model Compression and Acceleration on Mobile Devices , 2018, ECCV.

[42]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[43]  Yoshua Bengio,et al.  FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[44]  Michael A. Saunders,et al.  Proximal Newton-Type Methods for Minimizing Composite Functions , 2012, SIAM J. Optim..

[45]  Saurabh Goyal,et al.  Resource-efficient Machine Learning in 2 KB RAM for the Internet of Things , 2017, ICML.

[46]  Amos Storkey,et al.  A Closer Look at Structured Pruning for Neural Network Compression , 2018 .

[47]  Zhiru Zhang,et al.  Binarized Convolutional Neural Networks with Separable Filters for Efficient Hardware Acceleration , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[48]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[49]  Vincent Lepetit,et al.  Learning Separable Filters , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[51]  Erich Elsen,et al.  Exploring Sparsity in Recurrent Neural Networks , 2017, ICLR.

[52]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[53]  Xiaogang Wang,et al.  Convolutional neural networks with low-rank regularization , 2015, ICLR.

[54]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[55]  Tony X. Han,et al.  Learning Efficient Object Detection Models with Knowledge Distillation , 2017, NIPS.

[56]  Jiwen Lu,et al.  Runtime Neural Pruning , 2017, NIPS.

[57]  Eunhyeok Park,et al.  Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications , 2015, ICLR.

[58]  H. John Caulfield,et al.  Weight discretization paradigm for optical neural networks , 1990, Other Conferences.

[59]  François Fleuret,et al.  Knowledge Transfer with Jacobian Matching , 2018, ICML.

[60]  Jian Sun,et al.  Deep Learning with Low Precision by Half-Wave Gaussian Quantization , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  Andrew Zisserman,et al.  Speeding up Convolutional Neural Networks with Low Rank Expansions , 2014, BMVC.

[62]  Paris Smaragdis,et al.  Bitwise Neural Networks , 2016, ArXiv.

[63]  Alain Rakotomamonjy,et al.  DC Proximal Newton for Nonconvex Optimization Problems , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[64]  Arash Ardakani,et al.  Learning Recurrent Binary/Ternary Weights , 2018, ICLR.

[65]  Mingjie Sun,et al.  Rethinking the Value of Network Pruning , 2018, ICLR.

[66]  Lobacheva Ekaterina,et al.  Bayesian Sparsification of Recurrent Neural Networks. , 2017 .

[67]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[68]  R. Venkatesh Babu,et al.  Data-free Parameter Pruning for Deep Neural Networks , 2015, BMVC.

[69]  Yoshua Bengio,et al.  BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[70]  Song Han,et al.  EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[71]  Zhiqiang Shen,et al.  Learning Efficient Convolutional Networks through Network Slimming , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[72]  Michael Carbin,et al.  The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.

[73]  Nikos Komodakis,et al.  Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer , 2016, ICLR.

[74]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[75]  Xiangyu Zhang,et al.  ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design , 2018, ECCV.

[76]  Prateek Jain,et al.  FastGRNN: A Fast, Accurate, Stable and Tiny Kilobyte Sized Gated Recurrent Neural Network , 2018, NeurIPS.

[77]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[78]  Rich Caruana,et al.  Model compression , 2006, KDD '06.

[79]  Tianqi Chen,et al.  Net2Net: Accelerating Learning via Knowledge Transfer , 2015, ICLR.

[80]  Zhi Zhou,et al.  Edge AI: On-Demand Accelerating Deep Neural Network Inference via Edge Computing , 2019, IEEE Transactions on Wireless Communications.

[81]  SchmidhuberJürgen,et al.  2005 Special Issue , 2005 .

[82]  Dacheng Tao,et al.  On Compressing Deep Models by Low Rank and Sparse Decomposition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[83]  Sachin S. Talathi,et al.  Fixed Point Quantization of Deep Convolutional Networks , 2015, ICML.

[84]  Jonathan G. Fiscus,et al.  DARPA TIMIT:: acoustic-phonetic continuous speech corpus CD-ROM, NIST speech disc 1-1.1 , 1993 .

[85]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[86]  Wonyong Sung,et al.  X1000 real-time phoneme recognition VLSI using feed-forward deep neural networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[87]  Hanan Samet,et al.  Pruning Filters for Efficient ConvNets , 2016, ICLR.

[88]  Jeff Pool,et al.  Sparse Persistent RNNs: Squeezing Large Recurrent Networks On-Chip , 2018, ICLR.

[89]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[90]  Lucas Theis,et al.  Faster gaze prediction with dense networks and Fisher pruning , 2018, ArXiv.

[91]  Asit K. Mishra,et al.  Apprentice: Using Knowledge Distillation Techniques To Improve Low-Precision Network Accuracy , 2017, ICLR.

[92]  Linda G. Shapiro,et al.  ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation , 2018, ECCV.

[93]  Gregory Frederick Diamos,et al.  Block-Sparse Recurrent Neural Networks , 2017, ArXiv.

[94]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[95]  Shuchang Zhou,et al.  Effective Quantization Methods for Recurrent Neural Networks , 2016, ArXiv.

[96]  Marian Verhelst,et al.  Embedded Deep Neural Network Processing: Algorithmic and Processor Techniques Bring Deep Learning to IoT and Edge Devices , 2017, IEEE Solid-State Circuits Magazine.

[97]  Daniel Soudry,et al.  Training Binary Multilayer Neural Networks for Image Classification using Expectation Backpropagation , 2015, ArXiv.

[98]  Pritish Narayanan,et al.  Deep Learning with Limited Numerical Precision , 2015, ICML.

[99]  Ying Zhang,et al.  Recurrent Neural Networks With Limited Numerical Precision , 2016, ArXiv.

[100]  Owen Rambow,et al.  Sentiment Analysis of Twitter Data , 2011 .

[101]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[102]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[103]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[104]  Jian Sun,et al.  Accelerating Very Deep Convolutional Networks for Classification and Detection , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[105]  Ini Oguntola,et al.  SlimNets: An Exploration of Deep Model Compression and Acceleration , 2018, 2018 IEEE High Performance extreme Computing Conference (HPEC).

[106]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[107]  Mark Horowitz,et al.  1.1 Computing's energy problem (and what we can do about it) , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).

[108]  Song Han,et al.  Trained Ternary Quantization , 2016, ICLR.

[109]  Yong Zhao,et al.  Binarized Neural Networks on the ImageNet Classification Task , 2016 .

[110]  Jianxin Wu,et al.  ThiNet: Pruning CNN Filters for a Thinner Net , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[111]  Wonyong Sung,et al.  Fixed-point performance analysis of recurrent neural networks , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[112]  Andrea Boni,et al.  Low-Power and Low-Cost Implementation of SVMs for Smart Sensors , 2007, IEEE Transactions on Instrumentation and Measurement.

[113]  Jian Sun,et al.  Efficient and accurate approximations of nonlinear convolutional networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[114]  Sanjeev Khudanpur,et al.  A time delay neural network architecture for efficient modeling of long temporal contexts , 2015, INTERSPEECH.

[115]  Mi-Young Lee,et al.  Hierarchical Compression of Deep Convolutional Neural Networks on Large Scale Visual Recognition for Mobile Applications , 2016 .

[116]  Dan Alistarh,et al.  Model compression via distillation and quantization , 2018, ICLR.

[117]  Kyuyeon Hwang,et al.  Fixed-point feedforward deep neural network design using weights +1, 0, and −1 , 2014, 2014 IEEE Workshop on Signal Processing Systems (SiPS).

[118]  Sanjeev Khudanpur,et al.  Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[119]  Kazuo Kyuma,et al.  Weight quantization in Boltzmann machines , 1991, Neural Networks.

[120]  Yoshua Bengio,et al.  Neural Networks with Few Multiplications , 2015, ICLR.

[121]  Geoffrey E. Hinton,et al.  Generating Text with Recurrent Neural Networks , 2011, ICML.

[122]  Ming Yang,et al.  Compressing Deep Convolutional Networks using Vector Quantization , 2014, ArXiv.

[123]  Rajesh Gupta,et al.  Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs , 2017, FPGA.

[124]  Yurong Chen,et al.  Dynamic Network Surgery for Efficient DNNs , 2016, NIPS.

[125]  Alexander J. Smola,et al.  Fastfood: Approximate Kernel Expansions in Loglinear Time , 2014, ArXiv.

[126]  Lin Xu,et al.  Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights , 2017, ICLR.

[127]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[128]  Shinichi Nakajima,et al.  Perfect Dimensionality Recovery by Variational Bayesian PCA , 2012, NIPS.

[129]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[130]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[131]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[132]  Babak Hassibi,et al.  Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[133]  James T. C. Teng,et al.  A Dynamic Programming Based Pruning Method for Decision Trees , 2001, INFORMS J. Comput..

[134]  Kenji Suzuki,et al.  A Simple Neural Network Pruning Algorithm with Application to Filter Synthesis , 2001, Neural Processing Letters.

[135]  Paris Smaragdis,et al.  Efficient Source Separation Using Bitwise Neural Networks , 2018 .

[136]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[137]  Sebastian Nowozin,et al.  Decision Jungles: Compact and Rich Models for Classification , 2013, NIPS.

[138]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, ArXiv.

[139]  Shi Chen,et al.  Shallowing Deep Networks: Layer-Wise Pruning Based on Feature Representations , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[140]  Chao Wang,et al.  Compression of Acoustic Event Detection Models With Quantized Distillation , 2019, INTERSPEECH.

[141]  Roberto Cipolla,et al.  Training CNNs with Low-Rank Filters for Efficient Image Classification , 2015, ICLR.

[142]  Swagath Venkataramani,et al.  PACT: Parameterized Clipping Activation for Quantized Neural Networks , 2018, ArXiv.

[143]  Thomas Demeester,et al.  Predefined Sparseness in Recurrent Sequence Models , 2018, CoNLL.

[144]  Jangho Kim,et al.  Paraphrasing Complex Network: Network Compression via Factor Transfer , 2018, NeurIPS.

[145]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[146]  Kilian Q. Weinberger,et al.  CondenseNet: An Efficient DenseNet Using Learned Group Convolutions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[147]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[148]  Ivan V. Oseledets,et al.  Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition , 2014, ICLR.

[149]  L. Tucker,et al.  Some mathematical notes on three-mode factor analysis , 1966, Psychometrika.

[150]  Rich Caruana,et al.  Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[151]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[152]  Shuchang Zhou,et al.  DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.

[153]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[154]  Vivienne Sze,et al.  Designing Energy-Efficient Convolutional Neural Networks Using Energy-Aware Pruning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[155]  Jianxiong Xiao,et al.  DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[156]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[157]  Jürgen Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[158]  Zhenlong Yuan,et al.  Droid-Sec: deep learning in android malware detection , 2015, SIGCOMM 2015.

[159]  Arash Ardakani,et al.  Sparsely-Connected Neural Networks: Towards Efficient VLSI Implementation of Deep Neural Networks , 2016, ICLR.

[160]  Ran El-Yaniv,et al.  Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations , 2016, J. Mach. Learn. Res..

[161]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[162]  Xu Lan,et al.  Knowledge Distillation by On-the-Fly Native Ensemble , 2018, NeurIPS.

[163]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[164]  Prasoon Goyal,et al.  Local Deep Kernel Learning for Efficient Non-linear SVM Prediction , 2013, ICML.

[165]  Dmitry P. Vetrov,et al.  Variational Dropout Sparsifies Deep Neural Networks , 2017, ICML.

[166]  Yixin Chen,et al.  Compressing Neural Networks with the Hashing Trick , 2015, ICML.

[167]  James T. Kwok,et al.  Loss-aware Weight Quantization of Deep Networks , 2018, ICLR.

[168]  Hanif D. Sherali,et al.  An Optimal Constrained Pruning Strategy for Decision Trees , 2009, INFORMS J. Comput..

[169]  Wonyong Sung,et al.  Structured Pruning of Deep Convolutional Neural Networks , 2015, ACM J. Emerg. Technol. Comput. Syst..

[170]  Suyog Gupta,et al.  To prune, or not to prune: exploring the efficacy of pruning for model compression , 2017, ICLR.

[171]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[172]  Xiangyu Zhang,et al.  Channel Pruning for Accelerating Very Deep Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[173]  Xiangyu Zhang,et al.  ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[174]  Ebru Arisoy,et al.  Low-rank matrix factorization for Deep Neural Network training with high-dimensional output targets , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[175]  Dmitry P. Vetrov,et al.  Bayesian Sparsification of Gated Recurrent Neural Networks , 2018, ArXiv.

[176]  Dmitry P. Vetrov,et al.  Bayesian Sparsification of Recurrent Neural Networks , 2017, ArXiv.

[177]  Alexander Novikov,et al.  Tensorizing Neural Networks , 2015, NIPS.

[178]  Shuchang Zhou,et al.  Balanced Quantization: An Effective and Efficient Approach to Quantized Neural Networks , 2017, Journal of Computer Science and Technology.

[179]  Yiming Wang,et al.  Semi-Orthogonal Low-Rank Matrix Factorization for Deep Neural Networks , 2018, INTERSPEECH.