A 104.8TOPS/W One-Shot Time-Based Neuromorphic Chip Employing Dynamic Threshold Error Correction in 65nm

As neural networks continue to infiltrate diverse application domains, computing will begin to move out of the cloud and onto edge devices necessitating fast, reliable, and low power solutions. To meet these requirements, we propose a time-domain core using one-shot delay measurements and a lightweight post-processing technique, Dynamic Threshold Error Correction (DTEC). This design differs from traditional digital implementations in that it uses the delay accumulated through a simple inverter chain distributed through an SRAM array to intrinsically compute resource intensive multiply-accumulate (MAC) operations. Implemented in 65nmLP CMOS we achieve, to our knowledge, the lowest reported energy efficiency for a neuromorphic processor with 52.4TSOp/s/W (104.8TOp/S/W) at 0.7V with 3b resolution for an impressive 19.1fJ/MAC.

[1]  Chris H. Kim,et al.  A scalable time-based integrate-and-fire neuromorphic core with brain-inspired leak and local lateral inhibition capabilities , 2017, 2017 IEEE Custom Integrated Circuits Conference (CICC).

[2]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, ArXiv.

[3]  Joel Emer,et al.  Eyeriss: an Energy-efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks Accessed Terms of Use , 2022 .

[4]  Jun Deguchi,et al.  Time-domain neural network: A 48.5 TSOp/s/W neuromorphic chip optimized for deep learning and CMOS technology , 2016, 2016 IEEE Asian Solid-State Circuits Conference (A-SSCC).

[5]  David Blaauw,et al.  14.7 A 288µW programmable deep-learning processor with 270KB on-chip weight storage using non-uniform memory hierarchy for mobile intelligence , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[6]  Marian Verhelst,et al.  14.5 Envision: A 0.26-to-10TOPS/W subword-parallel dynamic-voltage-accuracy-frequency-scalable Convolutional Neural Network processor in 28nm FDSOI , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[7]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[8]  S. Simon Wong,et al.  24.2 A 2.5GHz 7.7TOPS/W switched-capacitor matrix multiplier with co-designed local memory in 40nm , 2016, 2016 IEEE International Solid-State Circuits Conference (ISSCC).

[9]  Andrew S. Cassidy,et al.  A million spiking-neuron integrated circuit with a scalable communication network and interface , 2014, Science.