A Spike in Performance: Training Hybrid-Spiking Neural Networks with Quantized Activation Functions

The machine learning community has become increasingly interested in the energy efficiency of neural networks. The Spiking Neural Network (SNN) is a promising approach to energy-efficient computing, since its activation levels are quantized into temporally sparse, one-bit values (i.e., "spike" events), which additionally converts the sum over weight-activity products into a simple addition of weights (one weight for each spike). However, the goal of maintaining state-of-the-art (SotA) accuracy when converting a non-spiking network into an SNN has remained an elusive challenge, primarily due to spikes having only a single bit of precision. Adopting tools from signal processing, we cast neural activation functions as quantizers with temporally-diffused error, and then train networks while smoothly interpolating between the non-spiking and spiking regimes. We apply this technique to the Legendre Memory Unit (LMU) to obtain the first known example of a hybrid SNN outperforming SotA recurrent architectures---including the LSTM, GRU, and NRU---in accuracy, while reducing activities to at most 3.74 bits on average with 1.26 significant bits multiplying each weight. We discuss how these methods can significantly improve the energy efficiency of neural networks.

[1]  Tetsuya Asai,et al.  Dither NN: An Accurate Neural Network with Dithering for Low Bit-Precision Hardware , 2018, 2018 International Conference on Field-Programmable Technology (FPT).

[2]  Bernabé Linares-Barranco,et al.  Conversion of Synchronous Artificial Neural Network to Asynchronous Spiking Neural Network using sigma-delta quantization , 2019, 2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS).

[3]  Steve B. Furber,et al.  Memory-Efficient Deep Learning on a SpiNNaker 2 Prototype , 2018, Front. Neurosci..

[4]  Pritish Narayanan,et al.  Deep Learning with Limited Numerical Precision , 2015, ICML.

[5]  David Moloney Embedded deep neural networks: “The cost of everything and the value of nothing” , 2016, 2016 IEEE Hot Chips 28 Symposium (HCS).

[6]  Wenrui Zhang,et al.  Spike-Train Level Backpropagation for Training Deep Recurrent Spiking Neural Networks , 2019, NeurIPS.

[7]  Yoshua Bengio,et al.  BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[8]  Terrence J. Sejnowski,et al.  Gradient Descent for Spiking Neural Networks , 2017, NeurIPS.

[9]  Suyog Gupta,et al.  To prune, or not to prune: exploring the efficacy of pruning for model compression , 2017, ICLR.

[10]  Mingguo Zhao,et al.  Towards artificial general intelligence with hybrid Tianjic chip architecture , 2019, Nature.

[11]  Trevor Bekolay,et al.  Nengo: a Python tool for building large-scale functional brain models , 2014, Front. Neuroinform..

[12]  Yoshua Bengio,et al.  Towards Non-saturating Recurrent Units for Modelling Long-term Dependencies , 2019, AAAI.

[13]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[14]  Peter Blouw,et al.  Event-Driven Signal Processing with Neuromorphic Computing Systems , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[16]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[17]  Young C. Yoon,et al.  LIF and Simplified SRM Neurons Encode Signals Into Spikes via a Form of Asynchronous Pulse Sigma–Delta Modulation , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[18]  Chris Eliasmith,et al.  Spiking Deep Networks with LIF Neurons , 2015, ArXiv.

[19]  Peter Dayan,et al.  Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems , 2001 .

[20]  Bo Chen,et al.  Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Vivienne Sze,et al.  Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.

[22]  Zhiru Zhang,et al.  Channel Gating Neural Networks , 2018, NeurIPS.

[23]  Wolfgang Maass,et al.  Recognizing Images with at most one Spike per Neuron , 2020, ArXiv.

[24]  Vivien A. Casagrande,et al.  Biophysics of Computation: Information Processing in Single Neurons , 1999 .

[25]  Steve Furber,et al.  Stochastic rounding and reduced-precision fixed-point arithmetic for solving neural ODEs. , 2019 .

[26]  Dharmendra S. Modha,et al.  Discovering Low-Precision Networks Close to Full-Precision Networks for Efficient Embedded Inference , 2018, ArXiv.

[27]  Daniel Rasmussen,et al.  NengoDL: Combining Deep Learning and Neuromorphic Modelling Methods , 2018, Neuroinformatics.

[28]  Meng Li,et al.  Improving Efficiency in Neural Network Accelerator using Operands Hamming Distance Optimization , 2020, 2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC).

[29]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[30]  Daniel Soudry,et al.  "Neuronal spike generation mechanism as an oversampling, noise-shaping A-to-D converter" , 2012, NIPS.

[31]  Bernabe Linares-Barranco,et al.  Asynchronous Spiking Neurons, the Natural Key to Exploit Temporal Sparsity , 2019, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[32]  Hong Wang,et al.  Loihi: A Neuromorphic Manycore Processor with On-Chip Learning , 2018, IEEE Micro.

[33]  Qiang Liu,et al.  Energy-Aware Neural Architecture Optimization with Fast Splitting Steepest Descent , 2019, ArXiv.

[34]  Geoffrey E. Hinton,et al.  A Simple Way to Initialize Recurrent Networks of Rectified Linear Units , 2015, ArXiv.

[35]  Kwabena Boahen,et al.  Braindrop: A Mixed-Signal Neuromorphic Architecture With a Dynamical Systems-Based Programming Model , 2019, Proceedings of the IEEE.

[36]  O. Moreira,et al.  NeuronFlow: A Hybrid Neuromorphic – Dataflow Processor Architecture for AI Workloads , 2020, 2020 2nd IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS).

[37]  L'eon Bottou,et al.  Cold Case: The Lost MNIST Digits , 2019, NeurIPS.

[38]  Sek Chai,et al.  Bit Efficient Quantization for Deep Neural Networks , 2019, ArXiv.

[39]  Pete Warden,et al.  TinyML: Machine Learning with TensorFlow Lite on Arduino and Ultra-Low-Power Microcontrollers , 2019 .

[40]  Chris Eliasmith,et al.  Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks , 2019, NeurIPS.

[41]  Steve B. Furber,et al.  The SpiNNaker Project , 2014, Proceedings of the IEEE.

[42]  Robert A. Legenstein,et al.  Long short-term memory and Learning-to-learn in networks of spiking neurons , 2018, NeurIPS.

[43]  Yundong Zhang,et al.  Hello Edge: Keyword Spotting on Microcontrollers , 2017, ArXiv.

[44]  Chris Eliasmith,et al.  Improving Spiking Dynamical Networks: Accurate Delays, Higher-Order Synapses, and Time Cells , 2018, Neural Computation.

[45]  Peter Blouw,et al.  Benchmarking Keyword Spotting Efficiency on Neuromorphic Hardware , 2018, NICE '19.

[46]  Aaron R. Voelker,et al.  Dynamical Systems in Spiking Neuromorphic Hardware , 2019 .