论文信息 - LNS-Madam: Low-Precision Training in Logarithmic Number System Using Multiplicative Weight Update

LNS-Madam: Low-Precision Training in Logarithmic Number System Using Multiplicative Weight Update

Representing deep neural networks (DNNs) in low-precision is a promising approach to enable efficient acceleration and memory reduction. Previous methods that train DNNs in low-precision typically keep a copy of weights in high-precision during the weight updates. Directly training with low-precision weights leads to accuracy degradation due to complex interactions between the low-precision number systems and the learning algorithms. To address this issue, we develop a co-designed low-precision training framework, termed LNS-Madam, in which we jointly design a logarithmic number system (LNS) and a multiplicative weight update algorithm (Madam). We prove that LNS-Madam results in low quantization error during weight updates, leading to stable performance even if the precision is limited. We further propose a hardware design of LNS-Madam that resolves practical challenges in implementing an efficient datapath for LNS computations. Our implementation effectively reduces energy overhead incurred by LNS-to-integer conversion and partial sum accumulation. Experimental results show that LNS-Madam achieves comparable accuracy to full-precision counterparts with only 8 bits on popular computer vision and natural language tasks. Compared to FP32 and FP8, LNS-Madam reduces the energy consumption by over 90% and 55%, respectively.

[1] Tianshi Chen,et al. Cambricon-Q: A Hybrid Architecture for Efficient Training , 2021, 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA).

[2] William J. Dally,et al. VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference , 2021, MLSys.

[3] Brian Chmiel,et al. Neural gradients are near-lognormal: improved quantized and sparse training , 2020, ICLR.

[4] Amar Phanishayee,et al. Efficient Large-Scale Language Model Training on GPU Clusters , 2021, ArXiv.

[5] Joseph E. Gonzalez,et al. A Statistical Framework for Low-bitwidth Training of Deep Neural Networks , 2020, NeurIPS.

[6] Yisong Yue,et al. Learning compositional functions via multiplicative weight updates , 2020, NeurIPS.

[7] Patrick Judd,et al. Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation , 2020, ArXiv.

[8] Jeff Johnson. Efficient, arbitrarily high precision hardware logarithmic arithmetic for linear algebra , 2020, 2020 IEEE 27th Symposium on Computer Arithmetic (ARITH).

[9] Sri Parameswaran,et al. REALM: Reduced-Error Approximate Log-based Integer Multiplier , 2020, 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[10] Xianglong Liu,et al. Towards Unified INT8 Training for Convolutional Neural Network , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Peter A. Beerel,et al. Neural Network Training with Approximate Logarithmic Computations , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12] Swagath Venkataramani,et al. Ultra-Low Precision 4-bit Training of Deep Neural Networks , 2020, NeurIPS.

[13] 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE) , 2023, 2023 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[14] William J. Dally,et al. MAGNet: A Modular Accelerator Generator for Neural Networks , 2019, 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[15] Sri Parameswaran,et al. Approximate Integer and Floating-Point Dividers with Near-Zero Error Bias , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).

[16] Charbel Sakr,et al. Per-Tensor Fixed-Point Quantization of the Back-Propagation Algorithm , 2018, ICLR.

[17] Kamyar Azizzadenesheli,et al. signSGD with Majority Vote is Communication Efficient and Fault Tolerant , 2018, ICLR.

[18] Michael Carbin,et al. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.

[19] Swagath Venkataramani,et al. Hybrid 8-bit Floating Point (HFP8) Training and Inference for Deep Neural Networks , 2019, NeurIPS.

[20] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[21] Daniel Brand,et al. Training Deep Neural Networks with 8-bit Floating Point Numbers , 2018, NeurIPS.

[22] Gerd Ascheid,et al. Efficient Hardware Acceleration of CNNs using Logarithmic Data Representation with Arbitrary log-base , 2018, 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[23] Jeff Johnson,et al. Rethinking floating point for deep learning , 2018, ArXiv.

[24] Sri Parameswaran,et al. Minimally Biased Multipliers for Approximate Integer and Floating-Point Multiplication , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[25] Elad Hoffer,et al. Scalable Methods for 8-bit Training of Neural Networks , 2018, NeurIPS.

[26] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[27] Kunle Olukotun,et al. High-Accuracy Low-Precision Training , 2018, ArXiv.

[28] Kamyar Azizzadenesheli,et al. signSGD: compressed optimisation for non-convex problems , 2018, ICML.

[29] Shuang Wu,et al. Training and Inference with Integers in Deep Neural Networks , 2018, ICLR.

[30] Bo Chen,et al. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31] Daisuke Miyashita,et al. LogNet: Energy-efficient neural networks using logarithmic computation , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[32] Lin Xu,et al. Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights , 2017, ICLR.

[33] Shuchang Zhou,et al. DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.

[34] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[35] Ali Farhadi,et al. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[36] Daisuke Miyashita,et al. Convolutional Neural Networks using Logarithmic Data Representation , 2016, ArXiv.

[37] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38] T. Sejnowski,et al. Nanoconnectomic upper bound on the variability of synaptic plasticity , 2015, eLife.

[39] Pritish Narayanan,et al. Deep Learning with Limited Numerical Precision , 2015, ICML.

[40] Yoshua Bengio,et al. Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[41] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[42] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[43] Manfred K. Warmuth,et al. Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[44] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[45] Alice C. Parker,et al. The high-level synthesis of digital systems , 1990, Proc. IEEE.

[46] E. V. Krishnamurthy,et al. On Computer Multiplication and Division Using Binary Logarithms , 1963, IEEE Transactions on Electronic Computers.