High-Capacity Expert Binary Networks

Network binarization is a promising hardware-aware direction for creating efficient deep models. Despite its memory and computational advantages, reducing the accuracy gap between such models and their real-valued counterparts remains an unsolved challenging research problem. To this end, we make the following 3 contributions: (a) To increase model capacity, we propose Expert Binary Convolution, which, for the first time, tailors conditional computing to binary networks by learning to select one data-specific expert binary filter at a time conditioned on input features. (b) To increase representation capacity, we propose to address the inherent information bottleneck in binary networks by introducing an efficient width expansion mechanism which keeps the binary operations within the same budget. (c) To improve network design, we propose a principled binary network growth mechanism that unveils a set of network topologies of favorable properties. Overall, our method improves upon prior work, with no increase in computational cost by ~6%, reaching a groundbreaking ~71% on ImageNet classification.

[1]  Daniel Ulbricht,et al.  Learning to Branch for Multi-Task Learning , 2020, ICML.

[2]  Georgios Tzimiropoulos,et al.  Training Binary Neural Networks with Real-to-Binary Convolutions , 2020, ICLR.

[3]  Georgios Tzimiropoulos,et al.  BATS: Binary ArchitecTure Search , 2020, ECCV.

[4]  Kunal Pratap Singh,et al.  Learning Architectures for Binary Networks , 2020, ECCV.

[5]  P. Abbeel,et al.  Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization , 2019, MLSys.

[6]  Jingkuan Song,et al.  Forward and Backward Information Retention for Accurate Binary Neural Networks , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Zhiqiang Shen,et al.  MoBiNet: A Mobile Binary Network for Image Classification , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[8]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[9]  Georgios Tzimiropoulos,et al.  XNOR-Net++: Improved binary neural networks , 2019, BMVC.

[10]  Kai Han,et al.  Searching for Accurate Binary Neural Architectures , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[11]  Ray C. C. Cheung,et al.  Accurate and Compact Convolutional Neural Networks with Trained Binarization , 2019, BMVC.

[12]  Xianglong Liu,et al.  Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Kwang-Ting Cheng,et al.  Latent Weights Do Not Exist: Rethinking Binarized Neural Network Optimization , 2019, NeurIPS.

[14]  Jiwen Lu,et al.  Learning Channel-Wise Interactions for Binary Convolutional Neural Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Rongrong Ji,et al.  Circulant Binary Convolutional Networks: Enhancing the Performance of 1-Bit DCNNs With Circulant Back Propagation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[17]  Gaofeng Meng,et al.  Differentiable Architecture Search with Ensemble Gumbel-Softmax , 2019, ArXiv.

[18]  Maja Pantic,et al.  Improved training of binary networks for human pose estimation and image recognition , 2019, ArXiv.

[19]  Quoc V. Le,et al.  CondConv: Conditionally Parameterized Convolutions for Efficient Inference , 2019, NeurIPS.

[20]  Diana Marculescu,et al.  Regularizing Activation Distribution for Training Binarized Deep Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Yang Li,et al.  You Look Twice: GaterNet for Dynamic Filter Selection in CNNs , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Ian D. Reid,et al.  Structured Binary Neural Networks for Accurate Image Classification and Semantic Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Nicholas D. Lane,et al.  An Empirical study of Binary Neural Networks' Optimisation , 2018, ICLR.

[24]  Jae-Joon Han,et al.  Learning to Quantize Deep Networks by Optimizing Quantization Intervals With Task Loss , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Yiming Yang,et al.  DARTS: Differentiable Architecture Search , 2018, ICLR.

[26]  Xin Dong,et al.  Binary Ensemble Neural Network: More Bits per Network or More Networks per Bit? , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Alok Aggarwal,et al.  Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.

[28]  Wei Liu,et al.  Bi-Real Net: Enhancing the Performance of 1-bit CNNs With Improved Representational Capability and Advanced Training Algorithm , 2018, ECCV.

[29]  G. Hua,et al.  LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks , 2018, ECCV.

[30]  Philip Heng Wai Leong,et al.  SYQ: Learning Symmetric Quantization for Efficient Deep Neural Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Noam Shazeer,et al.  HydraNets: Specialized Dynamic Architectures for Efficient Inference , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Xin Wang,et al.  SkipNet: Learning Dynamic Routing in Convolutional Networks , 2017, ECCV.

[33]  Larry S. Davis,et al.  BlockDrop: Dynamic Inference Paths in Residual Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[35]  Wei Pan,et al.  Towards Accurate Binary Convolutional Neural Network , 2017, NIPS.

[36]  Kaiming He,et al.  Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.

[37]  Marc'Aurelio Ranzato,et al.  Hard Mixtures of Experts for Large Scale Weakly Supervised Vision , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Yi Li,et al.  Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[39]  Jian Sun,et al.  Deep Learning with Low Precision by Half-Wave Gaussian Quantization , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Geoffrey E. Hinton,et al.  Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , 2017, ICLR.

[41]  Song Han,et al.  Trained Ternary Quantization , 2016, ICLR.

[42]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[43]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[44]  Shuchang Zhou,et al.  DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.

[45]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[46]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[47]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, NIPS.

[48]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Joelle Pineau,et al.  Conditional Computation in Neural Networks for faster models , 2015, ArXiv.

[50]  Yoshua Bengio,et al.  BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[51]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[52]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[53]  Yoshua Bengio,et al.  Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[54]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[55]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[56]  E. Gumbel Statistical Theory of Extreme Values and Some Practical Applications : A Series of Lectures , 1954 .