Evaluations on Deep Neural Networks Training Using Posit Number System

The training of Deep Neural Networks (DNNs) brings enormous memory requirements and computational complexity, which makes it a challenge to train DNN models on resource-constrained devices. Training DNNs with reduced-precision data representation is crucial to mitigate this problem. In this article, we conduct a thorough investigation on training DNNs with low-bit posit numbers, a Type-III universal number (Unum). Through a comprehensive analysis of quantization with various data formats, it is demonstrated that the posit format shows great potential to be employed in the training of DNNs. Moreover, a DNN training framework using 8-bit posit is proposed with a novel tensor-wise scaling scheme. The experiments show the same performance as the state-of-the-art (SOTA) across multiple datasets (MNIST, CIFAR-10, ImageNet, and Penn Treebank) and model architectures (LeNet-5, AlexNet, ResNet, MobileNet-V2, and LSTM). We further design an energy-efficient hardware prototype for our framework. Compared to the standard floating-point counterpart, our design achieves a reduction of 68, 51, and 75 percent in terms of area, power, and memory capacity, respectively.

[1]  Jeff Johnson,et al.  Rethinking floating point for deep learning , 2018, ArXiv.

[2]  Daniel Brand,et al.  Training Deep Neural Networks with 8-bit Floating Point Numbers , 2018, NeurIPS.

[3]  Dipankar Das,et al.  Mixed Precision Training With 8-bit Floating Point , 2019, ArXiv.

[4]  Zhi Zhang,et al.  Bag of Tricks for Image Classification with Convolutional Neural Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Pradeep Dubey,et al.  A Study of BFLOAT16 for Deep Learning Training , 2019, ArXiv.

[6]  Hayden Kwok-Hay So,et al.  PACoGen: A Hardware Posit Arithmetic Core Generator , 2019, IEEE Access.

[7]  Shuchang Zhou,et al.  DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.

[8]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[9]  Siyuan Lu,et al.  Training Deep Neural Networks Using Posit Number System , 2019, 2019 32nd IEEE International System-on-Chip Conference (SOCC).

[10]  Dhireesha Kudithipudi,et al.  Cheetah: Mixed Low-Precision Hardware & Software Co-Design Framework for DNNs on the Edge , 2019, ArXiv.

[11]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[12]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[13]  J. Lagarias Euler's constant: Euler's work and modern developments , 2013, 1303.1856.

[14]  Richard Socher,et al.  An Analysis of Neural Language Modeling at Multiple Scales , 2018, ArXiv.

[15]  Shuang Wu,et al.  Training and Inference with Integers in Deep Neural Networks , 2018, ICLR.

[16]  Xin Wang,et al.  Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks , 2017, NIPS.

[17]  Yoshua Bengio,et al.  BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[18]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, NIPS.

[19]  Pradeep Dubey,et al.  Mixed Precision Training of Convolutional Neural Networks using Integer Operations , 2018, ICLR.

[20]  John L. Gustafson,et al.  Beating Floating Point at its Own Game: Posit Arithmetic , 2017, Supercomput. Front. Innov..

[21]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[22]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23]  Pritish Narayanan,et al.  Deep Learning with Limited Numerical Precision , 2015, ICML.

[24]  Daisuke Miyashita,et al.  Convolutional Neural Networks using Logarithmic Data Representation , 2016, ArXiv.

[25]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Shuang Wu,et al.  Training High-Performance and Large-Scale Deep Neural Networks with Full 8-bit Integers , 2020, Neural Networks.

[27]  John L. Gustafson,et al.  Deep Positron: A Deep Neural Network Using the Posit Number System , 2018, 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[28]  Hao Wu,et al.  Mixed Precision Training , 2017, ICLR.

[29]  Iosif Pinelis The exp-normal distribution is infinitely divisible , 2018 .

[30]  Seok-Bum Ko,et al.  Efficient Posit Multiply-Accumulate Unit Generator for Deep Learning Applications , 2019, 2019 IEEE International Symposium on Circuits and Systems (ISCAS).

[31]  John L. Gustafson,et al.  The End of Error: Unum Computing , 2015 .