Conversion of artificial recurrent neural networks to spiking neural networks for low-power neuromorphic hardware

In recent years the field of neuromorphic low-power systems gained significant momentum, spurring brain-inspired hardware systems which operate on principles that are fundamentally different from standard digital computers and thereby consume orders of magnitude less power. However, their wider use is still hindered by the lack of algorithms that can harness the strengths of such architectures. While neuromorphic adaptations of representation learning algorithms are now emerging, the efficient processing of temporal sequences or variable length-inputs remains difficult, partly due to challenges in representing and configuring the dynamics of spiking neural networks. Recurrent neural networks (RNN) are widely used in machine learning to solve a variety of sequence learning tasks. In this work we present a train-and-constrain methodology that enables the mapping of machine learned (Elman) RNNs on a substrate of spiking neurons, while being compatible with the capabilities of current and near-future neuromorphic systems. This “train-and-constrain” method consists of first training RNNs using backpropagation through time, then discretizing the weights and finally converting them to spiking RNNs by matching the responses of artificial neurons with those of the spiking neurons. We demonstrate our approach by mapping a natural language processing task (question classification), where we demonstrate the entire mapping process of the recurrent layer of the network on IBM's Neurosynaptic System TrueNorth, a spike-based digital neuromorphic hardware architecture. TrueNorth imposes specific constraints on connectivity, neural and synaptic parameters. To satisfy these constraints, it was necessary to discretize the synaptic weights to 16 levels, discretize the neural activities to 16 levels, and to limit fan-in to 64 inputs. Surprisingly, we find that short synaptic delays are sufficient to implement the dynamic (temporal) aspect of the RNN in the question classification task. Furthermore we observed that the discretization of the neural activities is beneficial to our train-and-constrain approach. The hardware-constrained model achieved 74% accuracy in question classification while using less than 0.025% of the cores on one TrueNorth chip, resulting in an estimated power consumption of ≈ 17μW.

[1]  Matthew Cook,et al.  Fast-classifying, high-accuracy spiking deep networks through weight and threshold balancing , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[2]  Geoffrey Zweig,et al.  Context dependent recurrent neural network language model , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[3]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[4]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[5]  Dharmendra S. Modha,et al.  Backpropagation for Energy-Efficient Neuromorphic Computing , 2015, NIPS.

[6]  Zengchang Qin,et al.  Question Classification using Head Words and their Hypernyms , 2008, EMNLP.

[7]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[8]  Andrew S. Cassidy,et al.  Convolutional networks for fast, energy-efficient neuromorphic computing , 2016, Proceedings of the National Academy of Sciences.

[9]  Sen Song,et al.  Temporally Asymmetric Hebbian Learning, Spike liming and Neural Response Variability , 1998, NIPS.

[10]  Giacomo Indiveri,et al.  Rounding Methods for Neural Networks with Low Resolution Synaptic Weights , 2015, ArXiv.

[11]  Yoshua Bengio,et al.  Audio Chord Recognition with Recurrent Neural Networks , 2013, ISMIR.

[12]  Geoffrey E. Hinton,et al.  On rectified linear units for speech processing , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[15]  Enhong Chen,et al.  Word Embedding Revisited: A New Representation Learning and Explicit Matrix Factorization Perspective , 2015, IJCAI.

[16]  Omer Levy,et al.  Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.

[17]  Deepak Khosla,et al.  Spiking Deep Convolutional Neural Networks for Energy-Efficient Object Recognition , 2014, International Journal of Computer Vision.

[18]  Dan Roth,et al.  Learning Question Classifiers , 2002, COLING.

[19]  Andrew S. Cassidy,et al.  Cognitive computing building block: A versatile and efficient digital neuron model for neurosynaptic cores , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[20]  Matthew Cook,et al.  Unsupervised learning of digit recognition using spike-timing-dependent plasticity , 2015, Front. Comput. Neurosci..

[21]  David Kappel,et al.  STDP Installs in Winner-Take-All Circuits an Online Approximation to Hidden Markov Model Learning , 2014, PLoS Comput. Biol..

[22]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[23]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Shih-Chii Liu,et al.  Minitaur, an Event-Driven FPGA-Based Spiking Network Accelerator , 2014, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[25]  John C. Henderson,et al.  MITRE: Seven Systems for Semantic Similarity in Tweets , 2015, *SEMEVAL.

[26]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[27]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[28]  Giacomo Indiveri,et al.  A VLSI array of low-power spiking neurons and bistable synapses with spike-timing dependent plasticity , 2006, IEEE Transactions on Neural Networks.

[29]  J. Cowan,et al.  Excitatory and inhibitory interactions in localized populations of model neurons. , 1972, Biophysical journal.

[30]  Rodrigo Alvarez-Icaza,et al.  Neurogrid: A Mixed-Analog-Digital Multichip System for Large-Scale Neural Simulations , 2014, Proceedings of the IEEE.

[31]  Soumen Chakrabarti,et al.  Enhanced Answer Type Inference from Questions using Sequential Models , 2005, HLT/EMNLP.

[32]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[33]  Andrew S. Cassidy,et al.  A million spiking-neuron integrated circuit with a scalable communication network and interface , 2014, Science.

[34]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[35]  Luca Maria Gambardella,et al.  Deep Big Simple Neural Nets Excel on Handwritten Digit Recognition , 2010, ArXiv.

[36]  Marc'Aurelio Ranzato,et al.  Learning Longer Memory in Recurrent Neural Networks , 2014, ICLR.

[37]  Steve B. Furber,et al.  Robustness of spiking Deep Belief Networks to noise and reduced bit precision of neuro-inspired hardware platforms , 2015, Front. Neurosci..

[38]  Paul J. Werbos,et al.  Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[39]  Shaista Hussain,et al.  Delay learning architectures for memory and classification , 2014, Neurocomputing.

[40]  Alexander M. Rush,et al.  Character-Aware Neural Language Models , 2015, AAAI.

[41]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[42]  Luca Maria Gambardella,et al.  Deep, Big, Simple Neural Nets for Handwritten Digit Recognition , 2010, Neural Computation.

[43]  Alex Graves,et al.  Recurrent Models of Visual Attention , 2014, NIPS.

[44]  Luis A. Plana,et al.  SpiNNaker: Mapping neural networks onto a massively-parallel chip multiprocessor , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[45]  Phil Blunsom,et al.  A Convolutional Neural Network for Modelling Sentences , 2014, ACL.

[46]  Stefan Habenschuss,et al.  Stochastic Computations in Cortical Microcircuit Models , 2013, PLoS Comput. Biol..

[47]  Brian Kingsbury,et al.  New types of deep neural network learning for speech recognition and related applications: an overview , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[48]  Geoffrey E. Hinton,et al.  A Simple Way to Initialize Recurrent Networks of Rectified Linear Units , 2015, ArXiv.