Recurrent Neural Networks With Limited Numerical Precision

Recurrent Neural Networks (RNNs) produce state-of-art performance on many machine learning tasks but their demand on resources in terms of memory and computational power are often high. Therefore, there is a great interest in optimizing the computations performed with these models especially when considering development of specialized low-power hardware for deep networks. One way of reducing the computational needs is to limit the numerical precision of the network weights and biases. This has led to different proposed rounding methods which have been applied so far to only Convolutional Neural Networks and Fully-Connected Networks. This paper addresses the question of how to best reduce weight precision during training in the case of RNNs. We present results from the use of different stochastic and deterministic reduced precision training methods applied to three major RNN types which are then tested on several datasets. The results show that the weight binarization methods do not work with the RNNs. However, the stochastic and deterministic ternarization, and pow2-ternarization methods gave rise to low-precision RNNs that produce similar and even higher accuracy on certain datasets therefore providing a path towards training more efficient implementations of RNNs in specialized hardware.

[1]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[2]  Beatrice Santorini,et al.  The Penn Treebank: An Overview , 2003 .

[3]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[4]  Mehryar Mohri,et al.  Speech Recognition with Weighted Finite-State Transducers , 2008 .

[5]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[6]  Ilya Sutskever,et al.  SUBWORD LANGUAGE MODELING WITH NEURAL NETWORKS , 2011 .

[7]  Berin Martini,et al.  NeuFlow: A runtime reconfigurable dataflow processor for vision , 2011, CVPR 2011 WORKSHOPS.

[8]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[9]  Richard M. Schwartz,et al.  Fast and Robust Neural Network Joint Models for Statistical Machine Translation , 2014, ACL.

[10]  Navdeep Jaitly,et al.  Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.

[11]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[12]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[13]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[14]  Yoshua Bengio,et al.  Blocks and Fuel: Frameworks for deep learning , 2015, ArXiv.

[15]  Steve B. Furber,et al.  Robustness of spiking Deep Belief Networks to noise and reduced bit precision of neuro-inspired hardware platforms , 2015, Front. Neurosci..

[16]  Yajie Miao,et al.  EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[17]  Yoshua Bengio,et al.  BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[18]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[19]  Quoc V. Le,et al.  Listen, Attend and Spell , 2015, ArXiv.

[20]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[21]  Yoshua Bengio,et al.  BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1 , 2016, ArXiv.

[22]  Jun-Seok Park,et al.  14.6 A 1.42TOPS/W deep convolutional neural network recognition processor for intelligent IoE systems , 2016, 2016 IEEE International Solid-State Circuits Conference (ISSCC).

[23]  Yoshua Bengio,et al.  Neural Networks with Few Multiplications , 2015, ICLR.

[24]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[25]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[26]  Chong Wang,et al.  Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.

[27]  Yoshua Bengio,et al.  A Character-level Decoder without Explicit Segmentation for Neural Machine Translation , 2016, ACL.

[28]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[29]  Roland Memisevic,et al.  Regularizing RNNs by Stabilizing Activations , 2015, ICLR.

[30]  Wonyong Sung,et al.  Fixed-point performance analysis of recurrent neural networks , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[31]  Saibal Mukhopadhyay,et al.  On-chip training of recurrent neural networks with limited numerical precision , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).