ReCU: Reviving the Dead Weights in Binary Neural Networks

Binary neural networks (BNNs) have received increasing attention due to their superior reductions of computation and memory. Most existing works focus on either lessening the quantization error by minimizing the gap between the full-precision weights and their binarization or designing a gradient approximation to mitigate the gradient mismatch, while leaving the “dead weights” untouched. This leads to slow convergence when training BNNs. In this paper, for the first time, we explore the influence of “dead weights” which refer to a group of weights that are barely updated during the training of BNNs, and then introduce rectified clamp unit (ReCU) to revive the “dead weights” for updating. We prove that reviving the “dead weights” by ReCU can result in a smaller quantization error. Besides, we also take into account the information entropy of the weights, and then mathematically analyze why the weight standardization can benefit BNNs. We demonstrate the inherent contradiction between minimizing the quantization error and maximizing the information entropy, and then propose an adaptive exponential scheduler to identify the range of the “dead weights”. By considering the “dead weights”, our method offers not only faster BNN training, but also state-of-the-art performance on CIFAR-10 and ImageNet, compared with recent methods. Code can be available at https://github.com/z-hXu/ReCU .

[1]  Yoshua Bengio,et al.  BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[2]  Jangho Kim,et al.  Paraphrasing Complex Network: Network Compression via Factor Transfer , 2018, NeurIPS.

[3]  Kwang-Ting Cheng,et al.  Latent Weights Do Not Exist: Rethinking Binarized Neural Network Optimization , 2019, NeurIPS.

[4]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Deliang Fan,et al.  Simultaneously Optimizing Weight and Quantizer of Ternary Neural Network Using Truncated Gaussian Approximation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Philip H. S. Torr,et al.  Mirror Descent View for Neural Network Quantization , 2019, AISTATS.

[7]  Bo Chen,et al.  Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Nicholas D. Lane,et al.  An Empirical study of Binary Neural Networks' Optimisation , 2018, ICLR.

[9]  Dacheng Tao,et al.  Searching for Low-Bit Weights in Quantized Neural Networks , 2020, NeurIPS.

[10]  Jiwen Lu,et al.  Runtime Neural Pruning , 2017, NIPS.

[11]  Shenghuo Zhu,et al.  Extremely Low Bit Neural Network: Squeeze the Last Bit Out with ADMM , 2017, AAAI.

[12]  Houqiang Li,et al.  Quantization Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  G. Hua,et al.  LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks , 2018, ECCV.

[14]  Georgios Tzimiropoulos,et al.  XNOR-Net++: Improved binary neural networks , 2019, BMVC.

[15]  Maja Pantic,et al.  Improved training of binary networks for human pose estimation and image recognition , 2019, ArXiv.

[16]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[17]  Yu Wang,et al.  Towards Lower Bit Multiplication for Convolutional Neural Network Training , 2020, ArXiv.

[18]  W. Beyer CRC Standard Mathematical Tables and Formulae , 1991 .

[19]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[20]  Jianyuan Guo,et al.  GhostNet: More Features From Cheap Operations , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Christoph Meinel,et al.  Back to Simplicity: How to Train Accurate BNNs from Scratch? , 2019, ArXiv.

[22]  Yan Wang,et al.  Rotated Binary Neural Network , 2020, NeurIPS.

[23]  Xiangyu Zhang,et al.  MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[24]  Ji Liu,et al.  Global Sparse Momentum SGD for Pruning Very Deep Neural Networks , 2019, NeurIPS.

[25]  Sheetal Kalyani,et al.  Understanding Learning Dynamics of Binary Neural Networks via Information Bottleneck , 2020, ArXiv.

[26]  Wei Pan,et al.  Towards Accurate Binary Convolutional Neural Network , 2017, NIPS.

[27]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[28]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[29]  Xiangyu Zhang,et al.  ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design , 2018, ECCV.

[30]  Sachin S. Talathi,et al.  Overcoming Challenges in Fixed Point Training of Deep Convolutional Networks , 2016, ArXiv.

[31]  Xianglong Liu,et al.  Forward and Backward Information Retention for Accurate Binary Neural Networks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Georgios Tzimiropoulos,et al.  High-Capacity Expert Binary Networks , 2020, ICLR.

[33]  Yoshua Bengio,et al.  Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[34]  Diana Marculescu,et al.  Regularizing Activation Distribution for Training Binarized Deep Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Shuchang Zhou,et al.  DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.

[36]  Rongrong Ji,et al.  Binarized Neural Architecture Search , 2020, AAAI.

[37]  Wei Liu,et al.  Bi-Real Net: Enhancing the Performance of 1-bit CNNs With Improved Representational Capability and Advanced Training Algorithm , 2018, ECCV.

[38]  Frank Hutter,et al.  SGDR: Stochastic Gradient Descent with Restarts , 2016, ArXiv.

[39]  Daniel Soudry,et al.  Post training 4-bit quantization of convolutional networks for rapid-deployment , 2018, NeurIPS.

[40]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[41]  Yoshua Bengio,et al.  FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[42]  Ian D. Reid,et al.  Towards Effective Low-Bitwidth Convolutional Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[44]  Georgios Tzimiropoulos,et al.  BATS: Binary ArchitecTure Search , 2020, ECCV.

[45]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[46]  Rongrong Ji,et al.  Bayesian Optimized 1-Bit CNNs , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[47]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[48]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Rongrong Ji,et al.  Circulant Binary Convolutional Networks: Enhancing the Performance of 1-Bit DCNNs With Circulant Back Propagation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Xianglong Liu,et al.  Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[51]  Kwang-Ting Cheng,et al.  ReActNet: Towards Precise Binary Neural Network with Generalized Activation Functions , 2020, ECCV.

[52]  David S. Doermann,et al.  Projection Convolutional Neural Networks for 1-bit CNNs via Discrete Back Propagation , 2018, AAAI.

[53]  Jian Sun,et al.  Deep Learning with Low Precision by Half-Wave Gaussian Quantization , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, NIPS.

[55]  Ling Shao,et al.  TBN: Convolutional Neural Network with Ternary Inputs and Binary Weights , 2018, ECCV.