Efficient Algorithms for Accelerating Spiking Neural Networks on MAC Array of SpiNNaker 2

The CPU-based system is widely used for simulating the brain-inspired spiking neural networks (SNN) by taking the benefit of flexibility, while processing high input spiking rates caused by immature coding mechanism costs many CPU cycles, and the introduction of additional information required by serial execution needs the time-consuming pre- and post-neuron matching algorithm. To address these issues, we propose an algorithm set leveraging the multiply-accumulate (MAC) array to accelerate the SNN inference. By rearranging and compressing operands losslessly, we retain the advantage of the MAC array on fast parallel computing, as well as alleviate the ineffective memory occupation and the waste of computing resources, which result from the inherent sparse feature of SNN and reluctant memory alignment from fixed MAC hardware structure. Benchmarking with an SNN radar gesture recognition model, the algorithms jointly optimize 82.71% of the execution time compared to the serial computation on the ARM M4F of the SpiNNaker 2 chip; 49.89% of the memory footprint is reduced contrasted with the unoptimized MAC calculation. This article explicitly expands the application field of the General Sparse Matrix-Matrix Multiplication (SpGEMM) issue to SNN, developing novel SpGEMM optimization algorithms fitting the SNN feature and MAC array.

[1]  Yexin Yan,et al.  E-prop on SpiNNaker 2: Exploring online learning in spiking RNNs on neuromorphic hardware , 2022, Frontiers in Neuroscience.

[2]  Florian Kelber,et al.  Spiking Neural Network based Real-time Radar Gesture Recognition Live Demonstration , 2022, 2022 IEEE 4th International Conference on Artificial Intelligence Circuits and Systems (AICAS).

[3]  Florian Kelber,et al.  Real-time Radar Gesture Classification with Spiking Neural Network on SpiNNaker 2 Prototype , 2022, International Conference on Artificial Intelligence Circuits and Systems.

[4]  Franz Marcus Schüffny,et al.  A 16-Channel Fully Configurable Neural SoC With 1.52 $\mu$W/Ch Signal Acquisition, 2.79 $\mu$W/Ch Real-Time Spike Classifier, and 1.79 TOPS/W Deep Neural Network Accelerator in 22 nm FDSOI , 2022, IEEE Transactions on Biomedical Circuits and Systems.

[5]  Ümit V. Çatalyürek,et al.  Column-Segmented Sparse Matrix-Matrix Multiplication on Multicore CPUs , 2021, International Conference on High Performance Computing.

[6]  Bernhard Vogginger,et al.  Applied Spiking Neural Networks for Radar-based Gesture Recognition , 2021, 2021 7th International Conference on Event-Based Control, Communication, and Signal Processing (EBCCSP).

[7]  Steve Furber,et al.  Low-Power Low-Latency Keyword Spotting and Adaptive Control with a SpiNNaker 2 Prototype and Comparison with Loihi , 2020, Neuromorph. Comput. Eng..

[8]  Weixing Ji,et al.  A Systematic Survey of General Sparse Matrix-matrix Multiplication , 2020, ACM Comput. Surv..

[9]  Song Han,et al.  SpArch: Efficient Architecture for Sparse Matrix Multiplication , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[10]  Steve Furber,et al.  SpiNNaker 2: A 10 Million Core Processor System for Brain Simulation and Machine Learning , 2019, ArXiv.

[11]  Thomas Nowotny,et al.  GPUs Outperform Current HPC and Neuromorphic Solutions in Terms of Speed and Energy When Simulating a Highly-Connected Cortical Model , 2018, Front. Neurosci..

[12]  Christian Y. A. Brenninkmeijer,et al.  sPyNNaker: A Software Package for Running PyNN Simulations on SpiNNaker , 2018, Front. Neurosci..

[13]  Timothy A. Davis,et al.  Graph algorithms via SuiteSparse: GraphBLAS: triangle counting and K-truss , 2018, 2018 IEEE High Performance extreme Computing Conference (HPEC).

[14]  Thomas Nowotny,et al.  GeNN: a code generation framework for accelerated brain simulations , 2016, Scientific Reports.

[15]  Javier Navaridas,et al.  SpiNNaker: impact of traffic locality, causality and burstiness on the performance of the interconnection network , 2010, Conf. Computing Frontiers.

[16]  Fred G. Gustavson,et al.  Two Fast Algorithms for Sparse Matrices: Multiplication and Permuted Transposition , 1978, TOMS.