论文信息 - A 2.56-mm2 718GOPS Configurable Spiking Convolutional Sparse Coding Accelerator in 40-nm CMOS

A 2.56-mm2 718GOPS Configurable Spiking Convolutional Sparse Coding Accelerator in 40-nm CMOS

A configurable neuroinspired inference accelerator is designed as an array of neurons, each operating in an independent clock domain. The accelerator implements a recurrent network using a novel sparse convolution for feedforward operations and sparse spike-driven reconstruction for feedback operations. The proposed sparse convolution efficiently skips zero-patches, and can be made to support practically any image and kernel size. A globally asynchronous locally synchronous architecture enables scalable design and load balancing to achieve 22% reduction in power. Fabricated in 40-nm CMOS, the 2.56-mm2 inference accelerator integrates 48 neurons, a hub, and an OpenRISC processor. The chip achieves 718GOPS at 380 MHz, and demonstrates applications in feature extraction from images and depth extraction from stereo images.

Chester Liu | Zhengya Zhang | Sung-Gun Cho

[1] Natalie D. Enright Jerger,et al. Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[2] Marian Verhelst,et al. 14.5 Envision: A 0.26-to-10TOPS/W subword-parallel dynamic-voltage-accuracy-frequency-scalable Convolutional Neural Network processor in 28nm FDSOI , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[3] Marc'Aurelio Ranzato,et al. Fast Inference in Sparse Coding Algorithms with Applications to Object Recognition , 2010, ArXiv.

[4] Rajat Raina,et al. Efficient sparse coding algorithms , 2006, NIPS.

[5] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[6] Leibo Liu,et al. Deep Convolutional Neural Network Architecture With Reconfigurable Computation Patterns , 2017, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[7] Zhengya Zhang,et al. A 6.67mW sparse coding ASIC enabling on-chip learning and inference , 2014, 2014 Symposium on VLSI Circuits Digest of Technical Papers.

[8] Richard G. Baraniuk,et al. Sparse Coding via Thresholding and Local Competition in Neural Circuits , 2008, Neural Computation.

[9] Thomas S. Huang,et al. Image Super-Resolution Via Sparse Representation , 2010, IEEE Transactions on Image Processing.

[10] Dharmendra S. Modha,et al. A digital neurosynaptic core using embedded crossbar memory with 45pJ per spike in 45nm , 2011, 2011 IEEE Custom Integrated Circuits Conference (CICC).

[11] Michael Elad,et al. Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries , 2006, IEEE Transactions on Image Processing.

[12] Michael Elad,et al. Image Sequence Denoising via Sparse and Redundant Representations , 2009, IEEE Transactions on Image Processing.

[13] Vivienne Sze,et al. 14.5 Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks , 2016, ISSCC.

[14] Zhengya Zhang,et al. A 127mW 1.63TOPS sparse spatio-temporal cognitive SoC for action classification and motion tracking in videos , 2017, 2017 Symposium on VLSI Circuits.

[15] Bruno A. Olshausen,et al. Sparse Coding Of Time-Varying Natural Images , 2010 .

[16] Garrett T. Kenyon,et al. Sparse encoding of binocular images for depth inference , 2016, 2016 IEEE Southwest Symposium on Image Analysis and Interpretation (SSIAI).

[17] Rémi Gribonval,et al. Sparse Representations in Audio and Music: From Coding to Source Separation , 2010, Proceedings of the IEEE.

[18] Zhengya Zhang,et al. A 640M pixel/s 3.65mW sparse event-driven neuromorphic object recognition processor with on-chip learning , 2015, 2015 Symposium on VLSI Circuits (VLSI Circuits).

[19] Jian Yang,et al. Robust sparse coding for face recognition , 2011, CVPR 2011.

[20] Wei Lu,et al. Replicating Kernels with a Short Stride Allows Sparse Reconstructions with Fewer Independent Kernels , 2014, ArXiv.

[21] Gordon Wetzstein,et al. Fast and flexible convolutional sparse coding , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Zhengya Zhang,et al. Efficient Hardware Architecture for Sparse Coding , 2014, IEEE Transactions on Signal Processing.

[23] Tian Sheuan Chang,et al. Data and Hardware Efficient Design for Convolutional Neural Network , 2018, IEEE Transactions on Circuits and Systems I: Regular Papers.

[24] Yiguang Chen,et al. Single-Image Super-Resolution Reconstruction via Learned Geometric Dictionaries and Clustered Sparse Coding , 2012, IEEE Transactions on Image Processing.

[25] Marian Verhelst,et al. 5 ENVISION : A 0 . 26-to-10 TOPS / W Subword-Parallel Dynamic-Voltage-Accuracy-Frequency-Scalable Convolutional Neural Network Processor in 28 nm FDSOI , 2017 .

[26] William J. Dally,et al. SCNN: An accelerator for compressed-sparse convolutional neural networks , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[27] Warren J. Gross,et al. An Architecture to Accelerate Convolution in Deep Neural Networks , 2018, IEEE Transactions on Circuits and Systems I: Regular Papers.

[28] David J. Field,et al. Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[29] Alain Greiner,et al. Bi-Synchronous FIFO for Synchronous Circuit Communication Well Suited for Network-on-Chip in GALS Architectures , 2007, First International Symposium on Networks-on-Chip (NOCS'07).

[30] Bernard Brezzo,et al. TrueNorth: Design and Tool Flow of a 65 mW 1 Million Neuron Programmable Neurosynaptic Chip , 2015, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.