AdaptivFloat: A Floating-point based Data Type for Resilient Deep Learning Inference

Conventional hardware-friendly quantization methods, such as fixed-point or integer, tend to perform poorly at very low word sizes as their shrinking dynamic ranges cannot adequately capture the wide data distributions commonly seen in sequence transduction models. We present AdaptivFloat, a floating-point inspired number representation format for deep learning that dynamically maximizes and optimally clips its available dynamic range, at a layer granularity, in order to create faithful encoding of neural network parameters. AdaptivFloat consistently produces higher inference accuracies compared to block floating-point, uniform, IEEE-like float or posit encodings at very low precision ($\leq$ 8-bit) across a diverse set of state-of-the-art neural network topologies. And notably, AdaptivFloat is seen surpassing baseline FP32 performance by up to +0.3 in BLEU score and -0.75 in word error rate at weight bit widths that are $\leq$ 8-bit. Experimental results on a deep neural network (DNN) hardware accelerator, exploiting AdaptivFloat logic in its computational datapath, demonstrate per-operation energy and area that is 0.9$\times$ and 1.14$\times$, respectively, that of equivalent bit width integer-based accelerator variants.

[1]  Kushal Datta,et al.  Efficient 8-Bit Quantization of Transformer Neural Machine Language Translation Model , 2019, ArXiv.

[2]  John L. Gustafson,et al.  Deep Positron: A Deep Neural Network Using the Posit Number System , 2018, 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[3]  Swagath Venkataramani,et al.  Accurate and Efficient 2-bit Quantized Neural Networks , 2019, MLSys.

[4]  Gerd Ascheid,et al.  Efficient Hardware Acceleration of CNNs using Logarithmic Data Representation with Arbitrary log-base , 2018, 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[5]  Jeff Johnson,et al.  Rethinking floating point for deep learning , 2018, ArXiv.

[6]  G. Hua,et al.  LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks , 2018, ECCV.

[7]  Eunhyeok Park,et al.  Energy-Efficient Neural Network Accelerator Based on Outlier-Aware Low-Precision Computation , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[8]  Eric S. Chung,et al.  A Configurable Cloud-Scale DNN Processor for Real-Time AI , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[9]  Christopher Torng,et al.  INVITED: A Modular Digital VLSI Flow for High-Productivity SoC Design , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[10]  Babak Falsafi,et al.  End-to-End DNN Training with Block Floating Point Arithmetic , 2018, ArXiv.

[11]  Tarek M. Taha,et al.  Effective Quantization Approaches for Recurrent Neural Networks , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[12]  Bo Chen,et al.  Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Tara N. Sainath,et al.  State-of-the-Art Speech Recognition with Sequence-to-Sequence Models , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Eriko Nurvitadhi,et al.  WRPN: Wide Reduced-Precision Networks , 2017, ICLR.

[15]  Xin Wang,et al.  Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks , 2017, NIPS.

[16]  Bohyung Han,et al.  Regularizing Deep Neural Networks by Noise: Its Interpretation and Optimization , 2017, NIPS.

[17]  Eunhyeok Park,et al.  Weighted-Entropy-Based Quantization for Deep Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[19]  John L. Gustafson,et al.  Beating Floating Point at its Own Game: Posit Arithmetic , 2017, Supercomput. Front. Innov..

[20]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[21]  Daisuke Miyashita,et al.  LogNet: Energy-efficient neural networks using logarithmic computation , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  Jian Sun,et al.  Deep Learning with Low Precision by Half-Wave Gaussian Quantization , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Alexander M. Rush,et al.  OpenNMT: Open-Source Toolkit for Neural Machine Translation , 2017, ACL.

[24]  Song Han,et al.  Trained Ternary Quantization , 2016, ICLR.

[25]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[26]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[27]  Shuchang Zhou,et al.  DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.

[28]  Gu-Yeon Wei,et al.  Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[29]  Daisuke Miyashita,et al.  Convolutional Neural Networks using Logarithmic Data Representation , 2016, ArXiv.

[30]  Tim Salimans,et al.  Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.

[31]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, NIPS.

[32]  Yoshua Bengio,et al.  BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1 , 2016, ArXiv.

[33]  Jian Cheng,et al.  Quantized Convolutional Neural Networks for Mobile Devices , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Sachin S. Talathi,et al.  Fixed Point Quantization of Deep Convolutional Networks , 2015, ICML.

[36]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[37]  Yoshua Bengio,et al.  BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[38]  Yoshua Bengio,et al.  Attention-Based Models for Speech Recognition , 2015, NIPS.

[39]  Pritish Narayanan,et al.  Deep Learning with Limited Numerical Precision , 2015, ICML.

[40]  Kyuyeon Hwang,et al.  Fixed-point feedforward deep neural network design using weights +1, 0, and −1 , 2014, 2014 IEEE Workshop on Signal Processing Systems (SiPS).

[41]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.