A Fully Bit-Flexible Computation in Memory Macro Using Multi-Functional Computing Bit Cell and Embedded Input Sparsity Sensing

Computation in memory (CIM) overcomes the von Neumann bottleneck by minimizing the communication overhead between memory and process elements. However, using conventional CIM architectures to realize multiply-accumulate operations (MACs) with flexible input and weight bit precision is extremely challenging. This article presents a fully bit-flexible CIM design with a compact area and high energy efficiency. The proposed CIM macro employs a novel multi-functional computing bit cell design by integrating the MAC and the A/D conversion to maximize efficiency and flexibility. Moreover, an embedded input sparsity sensing and a self-adaptive dynamic range (DR) scaling scheme are proposed to minimize the energy-consuming A/D conversions in CIM. Finally, the proposed CIM macro implementation utilizes an interleaved placement structure to enhance the weight-updating bandwidth and the layout symmetry. The proposed CIM design fabricated in standard 28-nm CMOS technology achieves an area efficiency of 27.7 TOPS/mm2 and an energy efficiency of 291 TOPS/W, demonstrating a highly energy-area-efficient flexible CIM solution.

[1]  Kaiyuan Yang,et al.  CAP-RAM: A Charge-Domain In-Memory Computing 6T-SRAM for Accurate and Precision-Programmable CNN Inference , 2021, IEEE Journal of Solid-State Circuits.

[2]  Chung-Chuan Lo,et al.  A Local Computing Cell and 6T SRAM-Based Computing-in-Memory Macro With 8-b MAC Operation for Edge AI Chips , 2021, IEEE Journal of Solid-State Circuits.

[3]  Bongjin Kim,et al.  Colonnade: A Reconfigurable SRAM-Based Digital Bit-Serial Compute-In-Memory Macro for Processing Neural Networks , 2021, IEEE Journal of Solid-State Circuits.

[4]  Jaydeep P. Kulkarni,et al.  16.2 eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded-Dynamic-Memory Array Realizing Adaptive Data Converters and Charge-Domain Computing , 2021, 2021 IEEE International Solid- State Circuits Conference (ISSCC).

[5]  Chung-Chuan Lo,et al.  16.3 A 28nm 384kb 6T-SRAM Computation-in-Memory Macro with 8b Precision for AI Edge Chips , 2021, 2021 IEEE International Solid- State Circuits Conference (ISSCC).

[6]  Hidehiro Fujiwara,et al.  An 89TOPS/W and 16.3TOPS/mm2 All-Digital SRAM-Based Full-Precision Compute-In Memory Macro in 22nm for Machine-Learning Edge Applications , 2021, 2021 IEEE International Solid- State Circuits Conference (ISSCC).

[7]  Nan Sun,et al.  A 2.75-to-75.9TOPS/W Computing-in-Memory NN Processor Supporting Set-Associate Block-Wise Zero Skipping and Ping-Pong CIM with Simultaneous Computation and Weight Updating , 2021, 2021 IEEE International Solid- State Circuits Conference (ISSCC).

[8]  Chung-Chuan Lo,et al.  A 22nm 4Mb 8b-Precision ReRAM Computing-in-Memory Macro with 11.91 to 195.7TOPS/W for Tiny AI Edge Devices , 2021, 2021 IEEE International Solid- State Circuits Conference (ISSCC).

[9]  Meng-Fan Chang,et al.  A 4-Kb 1-to-8-bit Configurable 6T SRAM-Based Computation-in-Memory Unit-Macro for CNN-Based AI Edge Processors , 2020, IEEE Journal of Solid-State Circuits.

[10]  Jae-sun Seo,et al.  C3SRAM: An In-Memory-Computing SRAM Macro Based on Robust Capacitive Coupling Computing Mechanism , 2020, IEEE Journal of Solid-State Circuits.

[11]  Yinqi Tang,et al.  A Programmable Heterogeneous Microprocessor Based on Bit-Scalable In-Memory Computing , 2020, IEEE Journal of Solid-State Circuits.

[12]  Jinghong Chen,et al.  A Time-Interleaved SAR ADC With Bypass-Based Opportunistic Adaptive Calibration , 2020, IEEE Journal of Solid-State Circuits.

[13]  Tony Tae-Hyoung Kim,et al.  A 16K Current-Based 8T SRAM Compute-In-Memory Macro with Decoupled Read/Write and 1-5bit Column ADC , 2020, 2020 IEEE Custom Integrated Circuits Conference (CICC).

[14]  Meng-Fan Chang,et al.  15.4 A 22nm 2Mb ReRAM Compute-in-Memory Macro with 121-28TOPS/W for Multibit MAC Computing for Tiny AI Edge Devices , 2020, 2020 IEEE International Solid- State Circuits Conference - (ISSCC).

[15]  Shih-Chieh Chang,et al.  15.2 A 28nm 64Kb Inference-Training Two-Way Transpose Multibit 6T SRAM Compute-in-Memory Macro for AI Edge Chips , 2020, 2020 IEEE International Solid- State Circuits Conference - (ISSCC).

[16]  Meng-Fan Chang,et al.  A Twin-8T SRAM Computation-in-Memory Unit-Macro for Multibit CNN-Based AI Edge Processors , 2020, IEEE Journal of Solid-State Circuits.

[17]  Qian Chen,et al.  A 16K SRAM-Based Mixed-Signal In-Memory Computing Macro Featuring Voltage-Mode Accumulator and Row-by-Row ADC , 2019, 2019 IEEE Asian Solid-State Circuits Conference (A-SSCC).

[18]  Hongyang Jia,et al.  In-Memory Computing: Advances and prospects , 2019, IEEE Solid-State Circuits Magazine.

[19]  Shunsuke Okumura,et al.  A Ternary Based Bit Scalable, 8.80 TOPS/W CNN accelerator with Many-core Processing-in-memory Architecture with 896K synapses/mm2 , 2019, 2019 Symposium on VLSI Technology.

[20]  Hossein Valavi,et al.  A 64-Tile 2.4-Mb In-Memory-Computing CNN Accelerator Employing Charge-Domain Compute , 2019, IEEE Journal of Solid-State Circuits.

[21]  Anantha P. Chandrakasan,et al.  CONV-SRAM: An Energy-Efficient SRAM With In-Memory Dot-Product Computation for Low-Power Convolutional Neural Networks , 2019, IEEE Journal of Solid-State Circuits.

[22]  Sujan Kumar Gonugondla,et al.  A Variation-Tolerant In-Memory Machine Learning Classifier via On-Chip Training , 2018, IEEE Journal of Solid-State Circuits.

[23]  Zhuo Wang,et al.  In-Memory Computation of a Machine-Learning Classifier in a Standard 6T SRAM Array , 2017, IEEE Journal of Solid-State Circuits.

[24]  T. Chiueh,et al.  A High-Throughput Energy–Area-Efficient Computing-in-Memory SRAM Using Unified Charge-Processing Network , 2021, IEEE Solid-State Circuits Letters.

[25]  Robert W. Brodersen,et al.  A 6-bit 600-MS/s 5.3-mW asynchronous ADC in 0.13-μm CMOS , 2006 .