A Systolic SNN Inference Accelerator and its Co-optimized Software Framework

Although Deep Neural Network (DNN) architectures have made some breakthroughs in computer vision tasks, they are not close to biological brain neurons. Spiking Neural Network (SNN) is highly expected to bridge the gap between artificial computing systems and bio-systems. And it also shows great potential in low power computing. This paper presents a low power hardware accelerator for SNN inference using systolic array, and a corresponding software framework for optimization. First, we give the hardware design which adopts systolic array inspired by explorations of SNN. Then we ensure correct data mapping for systolic array for the sake of computational correctness. Next, we use compression methods for decreasing both the runtime and memory footprint. Finally, we make the systolic array size-configurable to adapt to different input, so as to reduce computational overhead. We implement the accelerator on Xilinx FPGA V7 690T. The experimental results show that SNN inference on our scheme suffers little loss on accuracy (less than 0.1%) on MNIST and Fashion-MNIST, and the runtime of the time-consuming layers decreases. The total power of our scheme is 0.745 W at 100 MHz.

[1]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[2]  Shih-Chii Liu,et al.  Conversion of Continuous-Valued Deep Networks to Efficient Event-Driven Networks for Image Classification , 2017, Front. Neurosci..

[3]  Shih-Chii Liu,et al.  Minitaur, an Event-Driven FPGA-Based Spiking Network Accelerator , 2014, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[4]  Steve B. Furber,et al.  Scalable energy-efficient, low-latency implementations of trained spiking Deep Belief Networks on SpiNNaker , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[5]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[6]  J. A. Movshon,et al.  The dependence of response amplitude and variance of cat visual cortical neurones on stimulus contrast , 1981, Experimental Brain Research.

[7]  Catherine D. Schuman,et al.  A Survey of Neuromorphic Computing and Neural Networks in Hardware , 2017, ArXiv.

[8]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[9]  Deepak Khosla,et al.  Spiking Deep Convolutional Neural Networks for Energy-Efficient Object Recognition , 2014, International Journal of Computer Vision.

[10]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[11]  Haizhou Li,et al.  A Spike-Timing-Based Integrated Model for Pattern Recognition , 2013, Neural Computation.

[12]  Luca Benini,et al.  Hyperdrive: A Systolically Scalable Binary-Weight CNN Inference Engine for mW IoT End-Nodes , 2018, 2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI).

[13]  Junzhong Shen,et al.  Towards a Multi-array Architecture for Accelerating Large-scale Matrix Multiplication on FPGAs , 2018, 2018 IEEE International Symposium on Circuits and Systems (ISCAS).

[14]  Matthew Cook,et al.  Fast-classifying, high-accuracy spiking deep networks through weight and threshold balancing , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[15]  Giacomo Indiveri,et al.  A Scalable Multicore Architecture With Heterogeneous Memory Structures for Dynamic Neuromorphic Asynchronous Processors (DYNAPs) , 2017, IEEE Transactions on Biomedical Circuits and Systems.

[16]  H. T. Kung,et al.  Systolic Arrays for (VLSI). , 1978 .

[17]  Wayne Luk,et al.  A Large-Scale Spiking Neural Network Accelerator for FPGA Systems , 2012, ICANN.

[18]  Simon R. Schultz,et al.  A parallel spiking neural network simulator , 2009, 2009 International Conference on Field-Programmable Technology.

[19]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[20]  Arthur L. Benton,et al.  Foundations of Physiological Psychology , 1968 .

[21]  Andrew S. Cassidy,et al.  Convolutional networks for fast, energy-efficient neuromorphic computing , 2016, Proceedings of the National Academy of Sciences.