CMOS-integrated memristive non-volatile computing-in-memory for AI edge processors

Non-volatile computing-in-memory (nvCIM) could improve the energy efficiency of edge devices for artificial intelligence applications. The basic functionality of nvCIM has recently been demonstrated using small-capacity memristor crossbar arrays combined with peripheral readout circuits made from discrete components. However, the advantages of the approach in terms of energy efficiency and operating speeds, as well as its robustness against device variability and sneak currents, have yet to be demonstrated experimentally. Here, we report a fully integrated memristive nvCIM structure that offers high energy efficiency and low latency for Boolean logic and multiply-and-accumulation (MAC) operations. We fabricate a 1 Mb resistive random-access memory (ReRAM) nvCIM macro that integrates a one-transistor–one-resistor ReRAM array with control and readout circuits on the same chip using an established 65 nm foundry complementary metal–oxide–semiconductor (CMOS) process. The approach offers an access time of 4.9 ns for three-input Boolean logic operations, a MAC computing time of 14.8 ns and an energy efficiency of 16.95 tera operations per second per watt. Applied to a deep neural network using a split binary-input ternary-weighted model, the system can achieve an inference accuracy of 98.8% on the MNIST dataset. A 1 Mb non-volatile computing-in-memory system, which integrates a resistive memory array with control and readout circuits using an established 65 nm foundry CMOS process, can offer high energy efficiency and low latency for Boolean logic and multiply-and-accumulation operations.

[1]  Sanu Mathew,et al.  340 mV–1.1 V, 289 Gbps/W, 2090-Gate NanoAES Hardware Accelerator With Area-Optimized Encrypt/Decrypt GF(2 4 ) 2 Polynomials in 22 nm Tri-Gate CMOS , 2015, IEEE Journal of Solid-State Circuits.

[2]  Yu Wang,et al.  PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[3]  Cheng-Xin Xue,et al.  Nonvolatile Circuits-Devices Interaction for Memory, Logic and Artificial Intelligence , 2018, 2018 IEEE Symposium on VLSI Technology.

[4]  Meng-Fan Chang,et al.  A 65nm 1Mb nonvolatile computing-in-memory ReRAM macro with sub-16ns multiply-and-accumulate for binary DNN AI edge processors , 2018, 2018 IEEE International Solid - State Circuits Conference - (ISSCC).

[5]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[6]  Wei Lu,et al.  The future of electronics based on memristive systems , 2018, Nature Electronics.

[7]  R. Kraus,et al.  Analysis and reduction of sense-amplifier offset , 1989 .

[8]  Yuan Xie,et al.  Securing Emerging Nonvolatile Main Memory With Fast and Energy-Efficient AES In-Memory Implementation , 2018, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[9]  Jiaming Zhang,et al.  Analogue signal and image processing with large memristor crossbars , 2017, Nature Electronics.

[10]  Bing Chen,et al.  Efficient in-memory computing architecture based on crossbar arrays , 2015, 2015 IEEE International Electron Devices Meeting (IEDM).

[11]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  H. Kanaya,et al.  4Gbit density STT-MRAM using perpendicular MTJ realized with compact cell structure , 2016, 2016 IEEE International Electron Devices Meeting (IEDM).

[13]  Pritish Narayanan,et al.  Equivalent-accuracy accelerated neural-network training using analogue memory , 2018, Nature.

[14]  Chung-Cheng Chou,et al.  An N40 256K×44 embedded RRAM macro with SL-precharge SA and low-voltage current limiter to improve read and write performance , 2018, 2018 IEEE International Solid - State Circuits Conference - (ISSCC).

[15]  H.-S. Philip Wong,et al.  In-memory computing with resistive switching devices , 2018, Nature Electronics.

[16]  Meng-Fan Chang,et al.  24.1 A 1Mb Multibit ReRAM Computing-In-Memory Macro with 14.6ns Parallel MAC Computing Time for CNN Based AI Edge Processors , 2019, 2019 IEEE International Solid- State Circuits Conference - (ISSCC).

[17]  Meng-Fan Chang,et al.  A 462GOPs/J RRAM-based nonvolatile intelligent processor for energy harvesting IoE system featuring nonvolatile logics and processing-in-memory , 2017, 2017 Symposium on VLSI Technology.

[18]  Akihito Yamamoto,et al.  23.5 A 4Gb LPDDR2 STT-MRAM with compact 9F2 1T1MTJ cell and hierarchical bitline architecture , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[19]  J Joshua Yang,et al.  Memristive devices for computing. , 2013, Nature nanotechnology.

[20]  Jan M. Rabaey,et al.  Hyperdimensional computing with 3D VRRAM in-memory kernels: Device-architecture co-design for energy-efficient, error-resilient language recognition , 2016, 2016 IEEE International Electron Devices Meeting (IEDM).

[21]  Meng-Fan Chang,et al.  An Offset-Tolerant Fast-Random-Read Current-Sampling-Based Sense Amplifier for Small-Cell-Current Nonvolatile Memory , 2013, IEEE Journal of Solid-State Circuits.

[22]  K. H. Ploog,et al.  Programmable computing with a single magnetoresistive element , 2003, Nature.

[23]  Meng-Fan Chang,et al.  A 65nm 4Kb algorithm-dependent computing-in-memory SRAM unit-macro with 2.3ns and 55.8TOPS/W fully parallel product-sum operation for binary DNN edge processors , 2018, 2018 IEEE International Solid - State Circuits Conference - (ISSCC).

[24]  Massimiliano Di Ventra,et al.  The parallel approach , 2013 .

[25]  Meng-Fan Chang,et al.  A 16Mb dual-mode ReRAM macro with sub-14ns computing-in-memory and memory functions enabled by self-write termination scheme , 2017, 2017 IEEE International Electron Devices Meeting (IEDM).

[26]  Vivienne Sze,et al.  Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.

[27]  Jason Cong,et al.  Scaling for edge inference of deep neural networks , 2018 .

[28]  Farnood Merrikh-Bayat,et al.  Training and operation of an integrated neuromorphic network based on metal-oxide memristors , 2014, Nature.

[29]  Joel Emer,et al.  Eyeriss: an Energy-efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks Accessed Terms of Use , 2022 .

[30]  Meng-Fan Chang,et al.  Parallelizing SRAM Arrays with Customized Bit-Cell for Binary Neural Networks , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[31]  Peng Lin,et al.  Fully memristive neural networks for pattern classification with unsupervised learning , 2018 .

[32]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  H.-S. Philip Wong,et al.  Face classification using electronic synapses , 2017, Nature Communications.

[34]  James R. Glass,et al.  14.4 A scalable speech recognizer with deep-neural-network acoustic models and voice-activated power gating , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[35]  Gregory S. Snider,et al.  ‘Memristive’ switches enable ‘stateful’ logic operations via material implication , 2010, Nature.

[36]  C. Lin,et al.  High density and ultra small cell size of Contact ReRAM (CR-RAM) in 90nm CMOS logic technology and circuits , 2009, 2009 IEEE International Electron Devices Meeting (IEDM).

[37]  An Chen A Highly Efficient and Scalable Model for Crossbar Arrays with Nonlinear Selectors , 2018, 2018 IEEE International Electron Devices Meeting (IEDM).

[38]  F. M. Lee,et al.  An ultra high endurance and thermally stable selector based on TeAsGeSiSe chalcogenides compatible with BEOL IC Integration for cross-point PCM , 2017, 2017 IEEE International Electron Devices Meeting (IEDM).

[39]  Hoi-Jun Yoo,et al.  14.2 DNPU: An 8.1TOPS/W reconfigurable CNN-RNN processor for general-purpose deep neural networks , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[40]  H-S Philip Wong,et al.  Memory leads the way to better computing. , 2015, Nature nanotechnology.

[41]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[42]  Cong Xu,et al.  Pinatubo: A processing-in-memory architecture for bulk bitwise operations in emerging non-volatile memories , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).