Distillation-Guided Residual Learning for Binary Convolutional Neural Networks

It is challenging to bridge the performance gap between Binary CNN (BCNN) and Floating point CNN (FCNN). We observe that, this performance gap leads to substantial residuals between intermediate feature maps of BCNN and FCNN. To minimize the performance gap, we enforce BCNN to produce similar intermediate feature maps with the ones of FCNN. This training strategy, i.e., optimizing each binary convolutional block with block-wise distillation loss derived from FCNN, leads to a more effective optimization to BCNN. It also motivates us to update the binary convolutional block architecture to facilitate the optimization of block-wise distillation loss. Specifically, a lightweight shortcut branch is inserted into each binary convolutional block to complement residuals at each block. Benefited from its Squeeze-and-Interaction (SI) structure, this shortcut branch introduces a fraction of parameters, e.g., 10\% overheads, but effectively complements the residuals. Extensive experiments on ImageNet demonstrate the superior performance of our method in both classification efficiency and accuracy, e.g., BCNN trained with our methods achieves the accuracy of 60.45\% on ImageNet.

[1]  Xiang Bai,et al.  Asymmetric Non-Local Neural Networks for Semantic Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[2]  Christoph Meinel,et al.  Back to Simplicity: How to Train Accurate BNNs from Scratch? , 2019, ArXiv.

[3]  Nikos Komodakis,et al.  Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer , 2016, ICLR.

[4]  Zhiqiang Shen,et al.  MoBiNet: A Mobile Binary Network for Image Classification , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[5]  Ian D. Reid,et al.  Towards Effective Low-Bitwidth Convolutional Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  David S. Doermann,et al.  Projection Convolutional Neural Networks for 1-bit CNNs via Discrete Back Propagation , 2018, AAAI.

[7]  Shuchang Zhou,et al.  DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.

[8]  Asit K. Mishra,et al.  Apprentice: Using Knowledge Distillation Techniques To Improve Low-Precision Network Accuracy , 2017, ICLR.

[9]  Jiwen Lu,et al.  Learning Channel-Wise Interactions for Binary Convolutional Neural Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[11]  Xin Dong,et al.  Binary Ensemble Neural Network: More Bits per Network or More Networks per Bit? , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, ArXiv.

[13]  Xiangyu Zhang,et al.  MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Guodong Guo,et al.  Rectified Binary Convolutional Networks for Enhancing the Performance of 1-bit DCNNs , 2019, IJCAI.

[15]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[16]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[17]  Rongrong Ji,et al.  Bayesian Optimized 1-Bit CNNs , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[18]  Xiangyu Zhang,et al.  ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Bin Liu,et al.  Ternary Weight Networks , 2016, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Yunchao Wei,et al.  CCNet: Criss-Cross Attention for Semantic Segmentation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[21]  Dan Alistarh,et al.  Model compression via distillation and quantization , 2018, ICLR.

[22]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[23]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[24]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[25]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[26]  Li Fei-Fei,et al.  Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[28]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Rongrong Ji,et al.  Circulant Binary Convolutional Networks: Enhancing the Performance of 1-Bit DCNNs With Circulant Back Propagation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Wei Pan,et al.  Towards Accurate Binary Convolutional Neural Network , 2017, NIPS.

[32]  Wei Liu,et al.  Bi-Real Net: Enhancing the Performance of 1-bit CNNs With Improved Representational Capability and Advanced Training Algorithm , 2018, ECCV.

[33]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[34]  Ling Shao,et al.  TBN: Convolutional Neural Network with Ternary Inputs and Binary Weights , 2018, ECCV.

[35]  Ian D. Reid,et al.  Structured Binary Neural Networks for Accurate Image Classification and Semantic Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).