A 2.56-mm2 718GOPS Configurable Spiking Convolutional Sparse Coding Accelerator in 40-nm CMOS

A configurable neuroinspired inference accelerator is designed as an array of neurons, each operating in an independent clock domain. The accelerator implements a recurrent network using a novel sparse convolution for feedforward operations and sparse spike-driven reconstruction for feedback operations. The proposed sparse convolution efficiently skips zero-patches, and can be made to support practically any image and kernel size. A globally asynchronous locally synchronous architecture enables scalable design and load balancing to achieve 22% reduction in power. Fabricated in 40-nm CMOS, the 2.56-mm2 inference accelerator integrates 48 neurons, a hub, and an OpenRISC processor. The chip achieves 718GOPS at 380 MHz, and demonstrates applications in feature extraction from images and depth extraction from stereo images.

[1]  Natalie D. Enright Jerger,et al.  Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[2]  Marian Verhelst,et al.  14.5 Envision: A 0.26-to-10TOPS/W subword-parallel dynamic-voltage-accuracy-frequency-scalable Convolutional Neural Network processor in 28nm FDSOI , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[3]  Marc'Aurelio Ranzato,et al.  Fast Inference in Sparse Coding Algorithms with Applications to Object Recognition , 2010, ArXiv.

[4]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[5]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[6]  Leibo Liu,et al.  Deep Convolutional Neural Network Architecture With Reconfigurable Computation Patterns , 2017, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[7]  Zhengya Zhang,et al.  A 6.67mW sparse coding ASIC enabling on-chip learning and inference , 2014, 2014 Symposium on VLSI Circuits Digest of Technical Papers.

[8]  Richard G. Baraniuk,et al.  Sparse Coding via Thresholding and Local Competition in Neural Circuits , 2008, Neural Computation.

[9]  Thomas S. Huang,et al.  Image Super-Resolution Via Sparse Representation , 2010, IEEE Transactions on Image Processing.

[10]  Dharmendra S. Modha,et al.  A digital neurosynaptic core using embedded crossbar memory with 45pJ per spike in 45nm , 2011, 2011 IEEE Custom Integrated Circuits Conference (CICC).

[11]  Michael Elad,et al.  Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries , 2006, IEEE Transactions on Image Processing.

[12]  Michael Elad,et al.  Image Sequence Denoising via Sparse and Redundant Representations , 2009, IEEE Transactions on Image Processing.

[13]  Vivienne Sze,et al.  14.5 Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks , 2016, ISSCC.

[14]  Zhengya Zhang,et al.  A 127mW 1.63TOPS sparse spatio-temporal cognitive SoC for action classification and motion tracking in videos , 2017, 2017 Symposium on VLSI Circuits.

[15]  Bruno A. Olshausen,et al.  Sparse Coding Of Time-Varying Natural Images , 2010 .

[16]  Garrett T. Kenyon,et al.  Sparse encoding of binocular images for depth inference , 2016, 2016 IEEE Southwest Symposium on Image Analysis and Interpretation (SSIAI).

[17]  Rémi Gribonval,et al.  Sparse Representations in Audio and Music: From Coding to Source Separation , 2010, Proceedings of the IEEE.

[18]  Zhengya Zhang,et al.  A 640M pixel/s 3.65mW sparse event-driven neuromorphic object recognition processor with on-chip learning , 2015, 2015 Symposium on VLSI Circuits (VLSI Circuits).

[19]  Jian Yang,et al.  Robust sparse coding for face recognition , 2011, CVPR 2011.

[20]  Wei Lu,et al.  Replicating Kernels with a Short Stride Allows Sparse Reconstructions with Fewer Independent Kernels , 2014, ArXiv.

[21]  Gordon Wetzstein,et al.  Fast and flexible convolutional sparse coding , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Zhengya Zhang,et al.  Efficient Hardware Architecture for Sparse Coding , 2014, IEEE Transactions on Signal Processing.

[23]  Tian Sheuan Chang,et al.  Data and Hardware Efficient Design for Convolutional Neural Network , 2018, IEEE Transactions on Circuits and Systems I: Regular Papers.

[24]  Yiguang Chen,et al.  Single-Image Super-Resolution Reconstruction via Learned Geometric Dictionaries and Clustered Sparse Coding , 2012, IEEE Transactions on Image Processing.

[25]  Marian Verhelst,et al.  5 ENVISION : A 0 . 26-to-10 TOPS / W Subword-Parallel Dynamic-Voltage-Accuracy-Frequency-Scalable Convolutional Neural Network Processor in 28 nm FDSOI , 2017 .

[26]  William J. Dally,et al.  SCNN: An accelerator for compressed-sparse convolutional neural networks , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[27]  Warren J. Gross,et al.  An Architecture to Accelerate Convolution in Deep Neural Networks , 2018, IEEE Transactions on Circuits and Systems I: Regular Papers.

[28]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[29]  Alain Greiner,et al.  Bi-Synchronous FIFO for Synchronous Circuit Communication Well Suited for Network-on-Chip in GALS Architectures , 2007, First International Symposium on Networks-on-Chip (NOCS'07).

[30]  Bernard Brezzo,et al.  TrueNorth: Design and Tool Flow of a 65 mW 1 Million Neuron Programmable Neurosynaptic Chip , 2015, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.