15.3 A 351TOPS/W and 372.4GOPS Compute-in-Memory SRAM Macro in 7nm FinFET CMOS for Machine-Learning Applications

Compute-in-memory (CIM) parallelizes multiply-and-average (MAV) computations and reduces off-chip weight access to reduce energy consumption and latency, specifically for Al edge devices. Prior CIM approaches demonstrated tradeoffs for area, noise margin, process variation and weight precision. 6T SRAM [1]–[3] provides the smallest cell area for CIM, but cell stability limits the number of activated cells, resulting in low parallelization. 10T and twin-8T [4]–[5] isolate the read/write paths for noise margin improvement, however both require special design of the bit cell using logic layout rules, resulting in over a 2x area overhead compared to foundry yield-optimized 6T SRAMs. Furthermore, single-bit precision of weights, in prior work [1]–[4], cannot meet the requirement for high-precision operations and scalability for large neural networks.

[1]  Sujan Kumar Gonugondla,et al.  A 42pJ/decision 3.12TOPS/W robust in-memory machine learning classifier with on-chip training , 2018, 2018 IEEE International Solid - State Circuits Conference - (ISSCC).

[2]  Meng-Fan Chang,et al.  24.5 A Twin-8T SRAM Computation-In-Memory Macro for Multiple-Bit CNN-Based Machine Learning , 2019, 2019 IEEE International Solid- State Circuits Conference - (ISSCC).

[3]  Zhuo Wang,et al.  In-Memory Computation of a Machine-Learning Classifier in a Standard 6T SRAM Array , 2017, IEEE Journal of Solid-State Circuits.

[4]  Meng-Fan Chang,et al.  A 65nm 4Kb algorithm-dependent computing-in-memory SRAM unit-macro with 2.3ns and 55.8TOPS/W fully parallel product-sum operation for binary DNN edge processors , 2018, 2018 IEEE International Solid - State Circuits Conference - (ISSCC).

[5]  Anantha Chandrakasan,et al.  Conv-RAM: An energy-efficient SRAM with embedded convolution computation for low-power CNN-based machine learning applications , 2018, 2018 IEEE International Solid - State Circuits Conference - (ISSCC).