Unsupervised Adaptive Weight Pruning for Energy-Efficient Neuromorphic Systems

To tackle real-world challenges, deep and complex neural networks are generally used with a massive number of parameters, which require large memory size, extensive computational operations, and high energy consumption in neuromorphic hardware systems. In this work, we propose an unsupervised online adaptive weight pruning method that dynamically removes non-critical weights from a spiking neural network (SNN) to reduce network complexity and improve energy efficiency. The adaptive pruning method explores neural dynamics and firing activity of SNNs and adapts the pruning threshold over time and neurons during training. The proposed adaptation scheme allows the network to effectively identify critical weights associated with each neuron by changing the pruning threshold dynamically over time and neurons. It balances the connection strength of neurons with the previous layer with adaptive thresholds and prevents weak neurons from failure after pruning. We also evaluated improvement in the energy efficiency of SNNs with our method by computing synaptic operations (SOPs). Simulation results and detailed analyses have revealed that applying adaptation in the pruning threshold can significantly improve network performance and reduce the number of SOPs. The pruned SNN with 800 excitatory neurons can achieve a 30% reduction in SOPs during training and a 55% reduction during inference, with only 0.44% accuracy loss on MNIST dataset. Compared with a previously reported online soft pruning method, the proposed adaptive pruning method shows 3.33% higher classification accuracy and 67% more reduction in SOPs. The effectiveness of our method was confirmed on different datasets and for different network sizes. Our evaluation showed that the implementation overhead of the adaptive method regarding speed, area, and energy is negligible in the network. Therefore, this work offers a promising solution for effective network compression and building highly energy-efficient neuromorphic systems in real-time applications.

[1]  Kaushik Roy,et al.  STDP-Based Pruning of Connections and Weight Quantization in Spiking Neural Networks for Energy-Efficient Recognition , 2017, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[2]  Richard Klein,et al.  Quantisation and Pruning for Neural Network Compression and Regularisation , 2020, 2020 International SAUPEC/RobMech/PRASA Conference.

[3]  Greg Mori,et al.  Deep Neural Network Compression by In-Parallel Pruning-Quantization , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Anthony N. Burkitt,et al.  A Review of the Integrate-and-fire Neuron Model: I. Homogeneous Synaptic Input , 2006, Biological Cybernetics.

[5]  Erich Elsen,et al.  Exploring Sparsity in Recurrent Neural Networks , 2017, ICLR.

[6]  Steve B. Furber,et al.  The SpiNNaker Project , 2014, Proceedings of the IEEE.

[7]  Sangheon Oh,et al.  A Soft-Pruning Method Applied During Training of Spiking Neural Networks for In-memory Computing Applications , 2019, Front. Neurosci..

[8]  Jinwon Lee,et al.  Learned Threshold Pruning , 2020, ArXiv.

[9]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[10]  Wenyao Xu,et al.  ADMM-based Weight Pruning for Real-Time Deep Learning Acceleration on Mobile Devices , 2019, ACM Great Lakes Symposium on VLSI.

[11]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[12]  Alessandro Rozza,et al.  Automated Pruning for Deep Neural Network Compression , 2017, 2018 24th International Conference on Pattern Recognition (ICPR).

[13]  Andrew S. Cassidy,et al.  A million spiking-neuron integrated circuit with a scalable communication network and interface , 2014, Science.

[14]  Ausif Mahmood,et al.  Review of Deep Learning Algorithms and Architectures , 2019, IEEE Access.

[15]  Ahmed M. Eltawil,et al.  Towards Efficient Neuromorphic Hardware: Unsupervised Adaptive Neuron Pruning , 2020, Electronics.

[16]  Tobi Delbruck,et al.  Real-time classification and sensor fusion with a spiking deep belief network , 2013, Front. Neurosci..

[17]  W. Gerstner,et al.  Triplets of Spikes in a Model of Spike Timing-Dependent Plasticity , 2006, The Journal of Neuroscience.

[18]  Wonyong Sung,et al.  Structured Pruning of Deep Convolutional Neural Networks , 2015, ACM J. Emerg. Technol. Comput. Syst..

[19]  Matthew Cook,et al.  Unsupervised learning of digit recognition using spike-timing-dependent plasticity , 2015, Front. Comput. Neurosci..

[20]  Yanzhi Wang,et al.  Progressive DNN Compression: A Key to Achieve Ultra-High Weight Pruning and Quantization Rates using ADMM , 2019, ArXiv.

[21]  Hong Wang,et al.  Loihi: A Neuromorphic Manycore Processor with On-Chip Learning , 2018, IEEE Micro.

[22]  Gert Cauwenberghs,et al.  Large-Scale Neuromorphic Spiking Array Processors: A Quest to Mimic the Brain , 2018, Front. Neurosci..

[23]  José Luis Peña,et al.  Spike-Threshold Adaptation Predicted by Membrane Potential Dynamics In Vivo , 2014, PLoS Comput. Biol..

[24]  Vladimir Stojanovic,et al.  Structured Deep Neural Network Pruning via Matrix Pivoting , 2017, ArXiv.

[25]  Carver A. Mead,et al.  Neuromorphic electronic systems , 1990, Proc. IEEE.

[26]  Marco Tomassini,et al.  Dynamics of pruning in simulated large-scale spiking neural networks. , 2005, Bio Systems.

[27]  E. Zillmer,et al.  Principles of Neuropsychology , 2000 .

[28]  Hanwool Jeong,et al.  Single Bit-Line 7T SRAM Cell for Near-Threshold Voltage Operation With Enhanced Performance and Energy in 14 nm FinFET Technology , 2016, IEEE Transactions on Circuits and Systems I: Regular Papers.

[29]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.