Deep Learning with Limited Numerical Precision

Training of large-scale deep neural networks is often constrained by the available computational resources. We study the effect of limited precision data representation and computation on neural network training. Within the context of low-precision fixed-point computations, we observe the rounding scheme to play a crucial role in determining the network's behavior during training. Our results show that deep networks can be trained using only 16-bit wide fixed-point number representation when using stochastic rounding, and incur little to no degradation in the classification accuracy. We also demonstrate an energy-efficient hardware accelerator that implements low-precision fixed-point arithmetic with stochastic rounding.

[1]  H. T. Kung Why systolic architectures? , 1982, Computer.

[2]  A. Iwata,et al.  An artificial neural network accelerator using general purpose 24 bit floating point digital signal processors , 1989, International 1989 Joint Conference on Neural Networks.

[3]  D. Hammerstrom,et al.  A VLSI architecture for high-performance, low-cost, on-chip learning , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[4]  Markus Höhfeld,et al.  Probabilistic rounding in neural network learning with limited precision , 1992, Neurocomputing.

[5]  Jenq-Neng Hwang,et al.  Finite Precision Error Analysis of Neural Network Hardware Implementations , 1993, IEEE Trans. Computers.

[6]  Alan F. Murray,et al.  Enhanced MLP performance and fault tolerance resulting from synaptic weight noise during training , 1994, IEEE Trans. Neural Networks.

[7]  Christopher M. Bishop,et al.  Current address: Microsoft Research, , 2022 .

[8]  Guozhong An,et al.  The Effects of Adding Noise During Backpropagation Training on a Generalization Performance , 1996, Neural Computation.

[9]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[10]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[11]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[12]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[13]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[14]  Julien Langou,et al.  Accelerating scientific computations with mixed precision algorithms , 2008, Comput. Phys. Commun..

[15]  Luca Maria Gambardella,et al.  Deep, Big, Simple Neural Nets for Handwritten Digit Recognition , 2010, Neural Computation.

[16]  Berin Martini,et al.  NeuFlow: A runtime reconfigurable dataflow processor for vision , 2011, CVPR 2011 WORKSHOPS.

[17]  Stephen J. Wright,et al.  Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.

[18]  Vincent Vanhoucke,et al.  Improving the speed of neural networks on CPUs , 2011 .

[19]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[20]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[21]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[22]  Osonde Osoba,et al.  Noise benefits in backpropagation and deep bidirectional pre-training , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[23]  Tao Wang,et al.  Deep learning with COTS HPC systems , 2013, ICML.

[24]  Vikas Sindhwani,et al.  Learning Machines Implemented on Non-Deterministic Hardware , 2014, ArXiv.

[25]  Yoshua Bengio,et al.  Training deep neural networks with low precision multiplications , 2014 .

[26]  Berin Martini,et al.  A 240 G-ops/s Mobile Coprocessor for Deep Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[27]  Jia Wang,et al.  DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[28]  Andrew S. Cassidy,et al.  A million spiking-neuron integrated circuit with a scalable communication network and interface , 2014, Science.

[29]  Wonyong Sung,et al.  X1000 real-time phoneme recognition VLSI using feed-forward deep neural networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[30]  Trishul M. Chilimbi,et al.  Project Adam: Building an Efficient and Scalable Deep Learning Training System , 2014, OSDI.

[31]  Yoshua Bengio,et al.  Low precision arithmetic for deep learning , 2014, ICLR.

[32]  Shengen Yan,et al.  Deep Image: Scaling up Image Recognition , 2015, ArXiv.