论文信息 - ProxQuant: Quantized Neural Networks via Proximal Operators

ProxQuant: Quantized Neural Networks via Proximal Operators

To make deep neural networks feasible in resource-constrained environments (such as mobile devices), it is beneficial to quantize models by using low-precision weights. One common technique for quantizing neural networks is the straight-through gradient method, which enables back-propagation through the quantization mapping. Despite its empirical success, little is understood about why the straight-through gradient method works. Building upon a novel observation that the straight-through gradient method is in fact identical to the well-known Nesterov's dual-averaging algorithm on a quantization constrained optimization problem, we propose a more principled alternative approach, called ProxQuant, that formulates quantized network training as a regularized learning problem instead and optimizes it via the prox-gradient method. ProxQuant does back-propagation on the underlying full-precision vector and applies an efficient prox-operator in between stochastic gradient steps to encourage quantizedness. For quantizing ResNets and LSTMs, ProxQuant outperforms state-of-the-art results on binary quantization and is on par with state-of-the-art on multi-bit quantization. For binary quantization, our analysis shows both theoretically and experimentally that ProxQuant is more stable than the straight-through gradient method (i.e. BinaryConnect), challenging the indispensability of the straight-through gradient method and providing a powerful alternative.

[1] Yiyu Shi,et al. On the Universal Approximability of Quantized ReLU Neural Networks , 2018, ArXiv.

[2] Lin Xiao,et al. Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[3] Song Han,et al. Trained Ternary Quantization , 2016, ICLR.

[4] Alexander G. Anderson,et al. The High-Dimensional Geometry of Binary Neural Networks , 2017, ICLR.

[5] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[6] Léon Bottou,et al. Wasserstein GAN , 2017, ArXiv.

[7] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[8] Ran El-Yaniv,et al. Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations , 2016, J. Mach. Learn. Res..

[9] C. Villani. Optimal Transport: Old and New , 2008 .

[10] Hanan Samet,et al. Training Quantized Nets: A Deeper Understanding , 2017, NIPS.

[11] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.

[12] Jack Xin,et al. BinaryRelax: A Relaxation Approach For Training Deep Neural Networks With Quantized Weights , 2018, SIAM J. Imaging Sci..

[13] Jack Xin,et al. Quantization and Training of Low Bit-Width Convolutional Neural Networks for Object Detection , 2016, Journal of Computational Mathematics.

[14] Bin Liu,et al. Ternary Weight Networks , 2016, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16] James T. Kwok,et al. Loss-aware Weight Quantization of Deep Networks , 2018, ICLR.

[17] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .

[18] Stephen P. Boyd,et al. Proximal Algorithms , 2013, Found. Trends Optim..

[19] Miguel Á. Carreira-Perpiñán,et al. Model compression as constrained optimization, with application to neural nets. Part II: quantization , 2017, ArXiv.

[20] Igor Carron,et al. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016 .

[21] Yoshua Bengio,et al. BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[22] Song Han,et al. EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[23] Hongbin Zha,et al. Alternating Multi-bit Quantization for Recurrent Neural Networks , 2018, ICLR.

[24] James T. Kwok,et al. Loss-aware Binarization of Deep Networks , 2016, ICLR.

[25] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[26] Shuchang Zhou,et al. DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.

[27] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[28] Miguel Á. Carreira-Perpiñán,et al. Model compression as constrained optimization, with application to neural nets. Part I: general framework , 2017, ArXiv.