Demystifying and Generalizing BinaryConnect

BinaryConnect (BC) and its many variations have become the de facto standard for neural network quantization. However, our understanding of the inner workings of BC is still quite limited. We attempt to close this gap in four different aspects: (a) we show that existing quantization algorithms, including post-training quantization, are surprisingly similar to each other; (b) we argue for proximal maps as a natural family of quantizers that is both easy to design and analyze; (c) we refine the observation that BC is a special case of dual averaging, which itself is a special case of the generalized conditional gradient algorithm; (d) consequently, we propose ProxConnect (PC) as a generalization of BC and we prove its convergence properties by exploiting the established connections. We conduct experiments on CIFAR-10 and ImageNet, and verify that PC achieves competitive performance.

[1]  Yanzhi Wang,et al.  On the Universal Approximation Property and Equivalence of Stochastic Computing-based Neural Networks and Binary Neural Networks , 2018, AAAI.

[2]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[3]  Pritish Narayanan,et al.  Deep Learning with Limited Numerical Precision , 2015, ICML.

[4]  Gregory B. Passty Ergodic convergence to a zero of the sum of monotone operators in Hilbert space , 1979 .

[5]  Kay Howard Barney The binary quantizer , 1949, Electrical Engineering.

[6]  Enhua Wu,et al.  Training Binary Neural Networks through Learning with Noisy Supervision , 2020, ICML.

[7]  Hongbin Zha,et al.  Alternating Multi-bit Quantization for Recurrent Neural Networks , 2018, ICLR.

[8]  Yoshua Bengio,et al.  BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[9]  James T. Kwok,et al.  Analysis of Quantized Models , 2019, ICLR.

[10]  Philip H. S. Torr,et al.  Mirror Descent View for Neural Network Quantization , 2019, AISTATS.

[11]  Yurii Nesterov,et al.  Primal-dual subgradient methods for convex problems , 2005, Math. Program..

[12]  Amin Karbasi,et al.  Quantized Frank-Wolfe: Faster Optimization, Lower Communication, and Projection Free , 2020, AISTATS.

[13]  Andrew McCallum,et al.  Energy and Policy Considerations for Deep Learning in NLP , 2019, ACL.

[14]  Nicholas D. Lane,et al.  An Empirical study of Binary Neural Networks' Optimisation , 2018, ICLR.

[15]  Jack Xin,et al.  Quantization and Training of Low Bit-Width Convolutional Neural Networks for Object Detection , 2016, Journal of Computational Mathematics.

[16]  Francis R. Bach,et al.  Duality Between Subgradient and Conditional Gradient Methods , 2012, SIAM J. Optim..

[17]  Shuchang Zhou,et al.  DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.

[18]  J. Moreau Proximité et dualité dans un espace hilbertien , 1965 .

[19]  Lin Xiao,et al.  Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[20]  Nicu Sebe,et al.  Binary Neural Networks: A Survey , 2020, Pattern Recognit..

[21]  Kwang-Ting Cheng,et al.  Latent Weights Do Not Exist: Rethinking Binarized Neural Network Optimization , 2019, NeurIPS.

[22]  Bin Liu,et al.  Ternary Weight Networks , 2016, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23]  Yoshua Bengio,et al.  Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[24]  Eunhyeok Park,et al.  Weighted-Entropy-Based Quantization for Deep Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[26]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[27]  Emily Denton,et al.  Characterising Bias in Compressed Models , 2020, ArXiv.

[28]  Alexander G. Anderson,et al.  The High-Dimensional Geometry of Binary Neural Networks , 2017, ICLR.

[29]  Yaoliang Yu,et al.  Minimizing Nonconvex Non-Separable Functions , 2015, AISTATS.

[30]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Lin Xu,et al.  Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights , 2017, ICLR.

[32]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[33]  Dirk A. Lorenz,et al.  Iterated Hard Shrinkage for Minimization Problems with Sparsity Constraints , 2008, SIAM J. Sci. Comput..

[34]  P. Dvurechensky,et al.  Self-concordant analysis of Frank-Wolfe algorithms , 2020, ICML.

[35]  Jinjun Xiong,et al.  On the Universal Approximability and Complexity Bounds of Quantized ReLU Neural Networks , 2018, ICLR.

[36]  Shenghuo Zhu,et al.  Extremely Low Bit Neural Network: Squeeze the Last Bit Out with ADMM , 2017, AAAI.

[37]  Yu Bai,et al.  ProxQuant: Quantized Neural Networks via Proximal Operators , 2018, ICLR.

[38]  Yunhui Guo,et al.  A Survey on Methods and Theories of Quantized Neural Networks , 2018, ArXiv.

[39]  Georgios Tzimiropoulos,et al.  Training Binary Neural Networks with Real-to-Binary Convolutions , 2020, ICLR.

[40]  Sinno Jialin Pan,et al.  MetaQuant: Learning to Quantize by Learning to Penetrate Non-differentiable Quantization , 2019, NeurIPS.

[41]  Ran El-Yaniv,et al.  Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations , 2016, J. Mach. Learn. Res..

[42]  Jack Xin,et al.  BinaryRelax: A Relaxation Approach For Training Deep Neural Networks With Quantized Weights , 2018, SIAM J. Imaging Sci..

[43]  Jack Xin,et al.  Understanding Straight-Through Estimator in Training Activation Quantized Neural Nets , 2019, ICLR.

[44]  Song Han,et al.  EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[45]  Song Han,et al.  Trained Ternary Quantization , 2016, ICLR.

[46]  Hanan Samet,et al.  Training Quantized Nets: A Deeper Understanding , 2017, NIPS.

[47]  Ethan Fetaya,et al.  Learning Discrete Weights Using the Local Reparameterization Trick , 2017, ICLR.

[48]  Song Han,et al.  MCUNet: Tiny Deep Learning on IoT Devices , 2020, NeurIPS.

[49]  Andrew Zisserman,et al.  Deep Frank-Wolfe For Neural Network Optimization , 2018, ICLR.