Rethinking Numerical Representations for Deep Neural Networks

With ever-increasing computational demand for deep learning, it is critical to investigate the implications of the numeric representation and precision of DNN model weights and activations on computational efficiency. In this work, we explore unconventional narrow-precision floating-point representations as it relates to inference accuracy and efficiency to steer the improved design of future DNN platforms. We show that inference using these custom numeric representations on production-grade DNNs, including GoogLeNet and VGG, achieves an average speedup of 7.6x with less than 1% degradation in inference accuracy relative to a state-of-the-art baseline platform representing the most sophisticated hardware using single-precision floating point. To facilitate the use of such customized precision, we also present a novel technique that drastically reduces the time required to derive the optimal precision configuration.

[1]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[2]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[3]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[4]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[5]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[6]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Erich Elsen,et al.  Deep Speech: Scaling up end-to-end speech recognition , 2014, ArXiv.

[8]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[9]  Jia Wang,et al.  DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[10]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[11]  Olivier Temam,et al.  Leveraging the error resilience of machine-learning applications for designing highly energy efficient accelerators , 2014, 2014 19th Asia and South Pacific Design Automation Conference (ASP-DAC).

[12]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Matthew Cook,et al.  Efficient implementation of STDP rules on SpiNNaker neuromorphic hardware , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[14]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[15]  Pritish Narayanan,et al.  Deep Learning with Limited Numerical Precision , 2015, ICML.

[16]  Yoshua Bengio,et al.  Low precision arithmetic for deep learning , 2014, ICLR.

[17]  Giacomo Indiveri,et al.  Rounding Methods for Neural Networks with Low Resolution Synaptic Weights , 2015, ArXiv.

[18]  Luca Benini,et al.  Origami: A Convolutional Network Accelerator , 2015, ACM Great Lakes Symposium on VLSI.

[19]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[21]  Luca Benini,et al.  A ultra-low-energy convolution engine for fast brain-inspired vision in multicore clusters , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[22]  Yixin Chen,et al.  Compressing Neural Networks with the Hashing Trick , 2015, ICML.

[23]  F. Michael,et al.  PerforatedCNNs: Acceleration through Elimination of Redundant Convolutions , 2016, ICLR 2016.