论文信息 - Evaluations on Deep Neural Networks Training Using Posit Number System

Evaluations on Deep Neural Networks Training Using Posit Number System

The training of Deep Neural Networks (DNNs) brings enormous memory requirements and computational complexity, which makes it a challenge to train DNN models on resource-constrained devices. Training DNNs with reduced-precision data representation is crucial to mitigate this problem. In this article, we conduct a thorough investigation on training DNNs with low-bit posit numbers, a Type-III universal number (Unum). Through a comprehensive analysis of quantization with various data formats, it is demonstrated that the posit format shows great potential to be employed in the training of DNNs. Moreover, a DNN training framework using 8-bit posit is proposed with a novel tensor-wise scaling scheme. The experiments show the same performance as the state-of-the-art (SOTA) across multiple datasets (MNIST, CIFAR-10, ImageNet, and Penn Treebank) and model architectures (LeNet-5, AlexNet, ResNet, MobileNet-V2, and LSTM). We further design an energy-efficient hardware prototype for our framework. Compared to the standard floating-point counterpart, our design achieves a reduction of 68, 51, and 75 percent in terms of area, power, and memory capacity, respectively.

[1] Jeff Johnson,et al. Rethinking floating point for deep learning , 2018, ArXiv.

[2] Daniel Brand,et al. Training Deep Neural Networks with 8-bit Floating Point Numbers , 2018, NeurIPS.

[3] Dipankar Das,et al. Mixed Precision Training With 8-bit Floating Point , 2019, ArXiv.

[4] Zhi Zhang,et al. Bag of Tricks for Image Classification with Convolutional Neural Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Pradeep Dubey,et al. A Study of BFLOAT16 for Deep Learning Training , 2019, ArXiv.

[6] Hayden Kwok-Hay So,et al. PACoGen: A Hardware Posit Arithmetic Core Generator , 2019, IEEE Access.

[7] Shuchang Zhou,et al. DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.

[8] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[9] Siyuan Lu,et al. Training Deep Neural Networks Using Posit Number System , 2019, 2019 32nd IEEE International System-on-Chip Conference (SOCC).

[10] Dhireesha Kudithipudi,et al. Cheetah: Mixed Low-Precision Hardware & Software Co-Design Framework for DNNs on the Edge , 2019, ArXiv.

[11] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[12] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[13] J. Lagarias. Euler's constant: Euler's work and modern developments , 2013, 1303.1856.

[14] Richard Socher,et al. An Analysis of Neural Language Modeling at Multiple Scales , 2018, ArXiv.

[15] Shuang Wu,et al. Training and Inference with Integers in Deep Neural Networks , 2018, ICLR.

[16] Xin Wang,et al. Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks , 2017, NIPS.

[17] Yoshua Bengio,et al. BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[18] Ran El-Yaniv,et al. Binarized Neural Networks , 2016, NIPS.

[19] Pradeep Dubey,et al. Mixed Precision Training of Convolutional Neural Networks using Integer Operations , 2018, ICLR.

[20] John L. Gustafson,et al. Beating Floating Point at its Own Game: Posit Arithmetic , 2017, Supercomput. Front. Innov..

[21] Ali Farhadi,et al. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[22] Mark Sandler,et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23] Pritish Narayanan,et al. Deep Learning with Limited Numerical Precision , 2015, ICML.

[24] Daisuke Miyashita,et al. Convolutional Neural Networks using Logarithmic Data Representation , 2016, ArXiv.

[25] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Shuang Wu,et al. Training High-Performance and Large-Scale Deep Neural Networks with Full 8-bit Integers , 2020, Neural Networks.

[27] John L. Gustafson,et al. Deep Positron: A Deep Neural Network Using the Posit Number System , 2018, 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[28] Hao Wu,et al. Mixed Precision Training , 2017, ICLR.

[29] Iosif Pinelis. The exp-normal distribution is infinitely divisible , 2018 .

[30] Seok-Bum Ko,et al. Efficient Posit Multiply-Accumulate Unit Generator for Deep Learning Applications , 2019, 2019 IEEE International Symposium on Circuits and Systems (ISCAS).

[31] John L. Gustafson,et al. The End of Error: Unum Computing , 2015 .