STQ-Nets: Unifying Network Binarization and Structured Pruning

We discuss a formulation for network compression combining two major paradigms: binarization and pruning. Past works on network binarization have demonstrated that networks are robust to the removal of activation/weight magnitude information, and can perform comparably to full-precision networks with signs alone. Pruning focuses on generating efficient and sparse networks. Both compression paradigms aid deployment in portable settings, where storage, compute and power are limited. We argue that these paradigms are complementary, and can be combined to offer high levels of compression and speedup without any significant accuracy loss. Intuitively, weights/activations closer to zero have higher binarization error making them good candidates for pruning. Our proposed formulation incorporates speedups from binary convolution algorithms through structured pruning, enabling the removal of pruned parts of the network entirely post-training, beating previous works attempting the same by a significant margin. Overall, our method brings up to 89x layer-wise compression over the corresponding full-precision networks – achieving only 0.33% loss on CIFAR-10 with ResNet-18 with a 40% PFR (Prune Factor Ratio for filters), and 0.3% on ImageNet with ResNet-18 with a 19% PFR.

[1]  Ron Meir,et al.  Expectation Backpropagation: Parameter-Free Training of Multilayer Neural Networks with Continuous or Discrete Weights , 2014, NIPS.

[2]  Ruiqin Xiong,et al.  Frequency-Domain Dynamic Pruning for Convolutional Neural Networks , 2018, NeurIPS.

[3]  Xin Dong,et al.  Binary Ensemble Neural Network: More Bits per Network or More Networks per Bit? , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Jose Javier Gonzalez Ortiz,et al.  What is the State of Neural Network Pruning? , 2020, MLSys.

[5]  Xin Dong,et al.  A Main/Subsidiary Network Framework for Simplifying Binary Neural Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[7]  Gang Hua,et al.  How to Train a Compact Binary Neural Network with High Accuracy? , 2017, AAAI.

[8]  Suyog Gupta,et al.  To prune, or not to prune: exploring the efficacy of pruning for model compression , 2017, ICLR.

[9]  Georgios Tzimiropoulos,et al.  Training Binary Neural Networks with Real-to-Binary Convolutions , 2020, ICLR.

[10]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[11]  Ameya Prabhu,et al.  Deep Expander Networks: Efficient Deep Networks from Graph Theory , 2017, ECCV.

[12]  Haojin Yang,et al.  BinaryDenseNet: Developing an Architecture for Binary Neural Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[13]  Wei Liu,et al.  Bi-Real Net: Enhancing the Performance of 1-bit CNNs With Improved Representational Capability and Advanced Training Algorithm , 2018, ECCV.

[14]  Ray C. C. Cheung,et al.  Accurate and Compact Convolutional Neural Networks with Trained Binarization , 2019, BMVC.

[15]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[16]  Bin Liu,et al.  Ternary Weight Networks , 2016, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  Guodong Zhang,et al.  Picking Winning Tickets Before Training by Preserving Gradient Flow , 2020, ICLR.

[18]  Hanan Samet,et al.  Pruning Filters for Efficient ConvNets , 2016, ICLR.

[19]  Yoshua Bengio,et al.  BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[20]  Shuchang Zhou,et al.  DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.

[21]  Jianxin Wu,et al.  AutoPruner: An End-to-End Trainable Filter Pruning Method for Efficient Deep Model Inference , 2018, Pattern Recognit..

[22]  Georgios Tzimiropoulos,et al.  XNOR-Net++: Improved binary neural networks , 2019, BMVC.

[23]  Mark D. McDonnell,et al.  Training wide residual networks for deployment using a single bit for each weight , 2018, ICLR.

[24]  Yoshua Bengio,et al.  Estimating or Propagating Gradients Through Stochastic Neurons , 2013, ArXiv.

[25]  Lin Xu,et al.  Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights , 2017, ICLR.

[26]  Nicu Sebe,et al.  Binary Neural Networks: A Survey , 2020, Pattern Recognit..

[27]  Zhenzhi Wu,et al.  GXNOR-Net: Training deep neural networks with ternary weights and activations without full-precision memory under a unified discretization framework , 2017, Neural Networks.

[28]  Daniel Soudry,et al.  Training Binary Multilayer Neural Networks for Image Classification using Expectation Backpropagation , 2015, ArXiv.

[29]  Ling Shao,et al.  TBN: Convolutional Neural Network with Ternary Inputs and Binary Weights , 2018, ECCV.

[30]  Zhiqiang Shen,et al.  Learning Efficient Convolutional Networks through Network Slimming , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[31]  Jianxin Wu,et al.  ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[32]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, NIPS.

[33]  Jin Fan,et al.  SBNN: Slimming binarized neural network , 2020, Neurocomputing.

[34]  Ian D. Reid,et al.  Structured Binary Neural Networks for Accurate Image Classification and Semantic Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Philip H. S. Torr,et al.  SNIP: Single-shot Network Pruning based on Connection Sensitivity , 2018, ICLR.

[36]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Song Han,et al.  Trained Ternary Quantization , 2016, ICLR.

[38]  Bingbing Ni,et al.  Performance Guaranteed Network Acceleration via High-Order Residual Quantization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[39]  Wei Pan,et al.  Towards Accurate Binary Convolutional Neural Network , 2017, NIPS.

[40]  Ameya Prabhu,et al.  Hybrid Binary Networks: Optimizing for Accuracy, Efficiency and Memory , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).