论文信息 - Training Deep Neural Networks with 8-bit Floating Point Numbers

Training Deep Neural Networks with 8-bit Floating Point Numbers

The state-of-the-art hardware platforms for training deep neural networks are moving from traditional single precision (32-bit) computations towards 16 bits of precision - in large part due to the high energy efficiency and smaller bit storage associated with using reduced-precision representations. However, unlike inference, training with numbers represented with less than 16 bits has been challenging due to the need to maintain fidelity of the gradient computations during back-propagation. Here we demonstrate, for the first time, the successful training of deep neural networks using 8-bit floating point numbers while fully maintaining the accuracy on a spectrum of deep learning models and datasets. In addition to reducing the data and computation precision to 8 bits, we also successfully reduce the arithmetic precision for additions (used in partial product accumulation and weight updates) from 32 bits to 16 bits through the introduction of a number of key ideas including chunk-based accumulation and floating point stochastic rounding. The use of these novel techniques lays the foundation for a new generation of hardware training platforms with the potential for 2-4 times improved throughput over today's systems.

[1] Geoffrey Zweig,et al. The microsoft 2016 conversational speech recognition system , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2] Shuchang Zhou,et al. DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.

[3] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[4] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[5] R. C. Whaley,et al. Reducing Floating Point Error in Dot Product Using the Superblock Family of Algorithms , 2008, SIAM J. Sci. Comput..

[6] Stuart C. Schwartz,et al. Best “ordering” for floating-point addition , 1988, TOMS.

[7] Pradeep Dubey,et al. Mixed Precision Training of Convolutional Neural Networks using Integer Operations , 2018, ICLR.

[8] Nicholas J. Higham,et al. The Accuracy of Floating Point Summation , 1993, SIAM J. Sci. Comput..

[9] Swagath Venkataramani,et al. PACT: Parameterized Clipping Activation for Quantized Neural Networks , 2018, ArXiv.

[10] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[11] Joel Silberman,et al. A Scalable Multi- TeraOPS Deep Learning Processor Core for AI Trainina and Inference , 2018, 2018 IEEE Symposium on VLSI Circuits.

[12] Swagath Venkataramani,et al. Exploiting approximate computing for deep learning acceleration , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[13] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14] A. Krizhevsky. Convolutional Deep Belief Networks on CIFAR-10 , 2010 .

[15] Xin Wang,et al. Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks , 2017, NIPS.

[16] Pritish Narayanan,et al. Deep Learning with Limited Numerical Precision , 2015, ICML.

[17] Hao Wu,et al. Mixed Precision Training , 2017, ICLR.

[18] Shuang Wu,et al. Training and Inference with Integers in Deep Neural Networks , 2018, ICLR.

[19] Bhuvana Ramabhadran,et al. Training variance and performance evaluation of neural networks in speech , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20] Suyog Gupta,et al. Model Accuracy and Runtime Tradeoff in Distributed Deep Learning: A Systematic Study , 2015, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[21] Ran El-Yaniv,et al. Binarized Neural Networks , 2016, ArXiv.