Cheetah: Mixed Low-Precision Hardware & Software Co-Design Framework for DNNs on the Edge

Low-precision DNNs have been extensively explored in order to reduce the size of DNN models for edge devices. Recently, the posit numerical format has shown promise for DNN data representation and compute with ultra-low precision in [5..8]-bits. However, previous studies were limited to studying posit for DNN inference only. In this paper, we propose the Cheetah framework, which supports both DNN training and inference using posits, as well as other commonly used formats. Additionally, the framework is amenable for different quantization approaches and supports mixed-precision floating point and fixed-point numerical formats. Cheetah is evaluated on three datasets: MNIST, Fashion MNIST, and CIFAR-10. Results indicate that 16-bit posits outperform 16-bit floating point in DNN training. Furthermore, performing inference with [5..8]-bit posits improves the trade-off between performance and energy-delay-product over both [5..8]-bit float and fixed-point.

[1]  Bo Chen,et al.  Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Xin Wang,et al.  Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks , 2017, NIPS.

[3]  Dipankar Das,et al.  Mixed Precision Training With 8-bit Floating Point , 2019, ArXiv.

[4]  Bolei Zhou,et al.  Revisiting the Importance of Individual Units in CNNs via Ablation , 2018, ArXiv.

[5]  Hari Angepat,et al.  Serving DNNs in Real Time at Datacenter Scale with Project Brainwave , 2018, IEEE Micro.

[6]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[7]  John L. Gustafson,et al.  Beating Floating Point at its Own Game: Posit Arithmetic , 2017, Supercomput. Front. Innov..

[8]  Yoshua Bengio,et al.  Low precision arithmetic for deep learning , 2014, ICLR.

[9]  Ulrich W. Kulisch,et al.  Computer Arithmetic and Validity - Theory, Implementation, and Applications , 2008, de Gruyter studies in mathematics.

[10]  A. Iwata,et al.  An artificial neural network accelerator using general purpose 24 bit floating point digital signal processors , 1989, International 1989 Joint Conference on Neural Networks.

[11]  Pritish Narayanan,et al.  Deep Learning with Limited Numerical Precision , 2015, ICML.

[12]  Christopher M. Bishop,et al.  Current address: Microsoft Research, , 2022 .

[13]  Daniel Brand,et al.  MEC: Memory-efficient Convolution for Deep Neural Network , 2017, ICML.

[14]  Yunhui Guo,et al.  A Survey on Methods and Theories of Quantized Neural Networks , 2018, ArXiv.

[15]  Hao Wu,et al.  Mixed Precision Training , 2017, ICLR.

[16]  Bin Yang,et al.  SBNet: Sparse Blocks Network for Fast Inference , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Jason Cong,et al.  Scaling for edge inference of deep neural networks , 2018 .

[18]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[19]  Walter Tichy Unums 2.0: An Interview with John L. Gustafson , 2016, UBIQ.

[20]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[21]  John L. Gustafson,et al.  Deep Positron: A Deep Neural Network Using the Posit Number System , 2018, 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[22]  Xu Chen,et al.  Edge Intelligence: On-Demand Deep Learning Model Co-Inference with Device-Edge Synergy , 2018, MECOMM@SIGCOMM.

[23]  D. Hammerstrom,et al.  A VLSI architecture for high-performance, low-cost, on-chip learning , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[24]  Jeff Johnson,et al.  Rethinking floating point for deep learning , 2018, ArXiv.

[25]  Pradeep Dubey,et al.  A Study of BFLOAT16 for Deep Learning Training , 2019, ArXiv.

[26]  James Hardy Wilkinson,et al.  Rounding errors in algebraic processes , 1964, IFIP Congress.

[27]  Weisong Shi,et al.  Edge Computing: Vision and Challenges , 2016, IEEE Internet of Things Journal.

[28]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[29]  Raghuraman Krishnamoorthi,et al.  Quantizing deep convolutional networks for efficient inference: A whitepaper , 2018, ArXiv.

[30]  Lawrence D. Jackel,et al.  VLSI implementation of a neural network model , 1988, Computer.

[31]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[32]  Mahadev Satyanarayanan,et al.  The Emergence of Edge Computing , 2017, Computer.

[33]  Daniel Brand,et al.  Training Deep Neural Networks with 8-bit Floating Point Numbers , 2018, NeurIPS.

[34]  Soheil Ghiasi,et al.  Ristretto: A Framework for Empirical Study of Resource-Efficient Inference in Convolutional Neural Networks , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[35]  Jean-Michel Muller,et al.  Posits: the good, the bad and the ugly , 2019, CoNGA'19.

[36]  K. Asanovi Experimental Determination of Precision Requirements for Back-propagation Training of Artiicial Neural Networks , 1991 .

[37]  Dhireesha Kudithipudi,et al.  Deep Learning Inference on Embedded Devices: Fixed-Point vs Posit , 2018, 2018 1st Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2).

[38]  Mark Horowitz,et al.  1.1 Computing's energy problem (and what we can do about it) , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).

[39]  John L. Gustafson,et al.  Performance-Efficiency Trade-off of Low-Precision Numerical Formats in Deep Neural Networks , 2019, Proceedings of the Conference for Next Generation Arithmetic 2019.

[40]  Sherief Reda,et al.  Understanding the impact of precision quantization on the accuracy and energy of neural networks , 2016, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[41]  Carole-Jean Wu,et al.  Machine Learning at Facebook: Understanding Inference at the Edge , 2019, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[42]  Shuicheng Yan,et al.  Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks With Octave Convolution , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[43]  Yoshua Bengio,et al.  Training deep neural networks with low precision multiplications , 2014 .