Monolithically Integrated RRAM- and CMOS-Based In-Memory Computing Optimizations for Efficient Deep Learning

Resistive RAM (RRAM) has been presented as a promising memory technology toward deep neural network (DNN) hardware design, with nonvolatility, high density, high ON/OFF ratio, and compatibility with logic process. However, prior RRAM works for DNNs have shown limitations on parallelism for in-memory computing, array efficiency with large peripheral circuits, multilevel analog operation, and demonstration of monolithic integration. In this article, we propose circuit-/device-level optimizations to improve the energy and density of RRAM-based in-memory computing architectures. We report experimental results based on prototype chip design of 128 × 64 RRAM arrays and CMOS peripheral circuits, where RRAM devices are monolithically integrated in a commercial 90-nm CMOS technology. We demonstrate the CMOS peripheral circuit optimization using input-splitting scheme and investigate the implication of higher low resistance state on energy efficiency and robustness. Employing the proposed techniques, we demonstrate RRAM-based in-memory computing with up to 116.0 TOPS/W energy efficiency and 84.2% CIFAR-10 accuracy. Furthermore, we investigate four-level programming with single RRAM device, and report the system-level performance and DNN accuracy results using circuit-level benchmark simulator NeuroSim.

[1]  Onur Mutlu,et al.  Ambit: In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[2]  Yuan Xie,et al.  DRISA: A DRAM-based Reconfigurable In-Situ Accelerator , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[3]  Jae-sun Seo,et al.  XNOR-SRAM: In-Memory Computing SRAM Macro for Binary/Ternary Deep Neural Networks , 2018, 2018 IEEE Symposium on VLSI Technology.

[4]  Jason Cong,et al.  Scaling for edge inference of deep neural networks , 2018 .

[5]  Ryutaro Yasuhara,et al.  A 4M Synapses integrated Analog ReRAM based 66.5 TOPS/W Neural-Network Processor with Cell Current Controlled Writing and Flexible Network Architecture , 2018, 2018 IEEE Symposium on VLSI Technology.

[6]  Hossein Valavi,et al.  A 64-Tile 2.4-Mb In-Memory-Computing CNN Accelerator Employing Charge-Domain Compute , 2019, IEEE Journal of Solid-State Circuits.

[7]  Meng-Fan Chang,et al.  24.5 A Twin-8T SRAM Computation-In-Memory Macro for Multiple-Bit CNN-Based Machine Learning , 2019, 2019 IEEE International Solid- State Circuits Conference - (ISSCC).

[8]  Meng-Fan Chang,et al.  24.1 A 1Mb Multibit ReRAM Computing-In-Memory Macro with 14.6ns Parallel MAC Computing Time for CNN Based AI Edge Processors , 2019, 2019 IEEE International Solid- State Circuits Conference - (ISSCC).