A 28nm 64-kb 31.6-TFLOPS/W Digital-Domain Floating-Point-Computing-Unit and Double-Bit 6T-SRAM Computing-in-Memory Macro for Floating-Point CNNs

SRAM-based computing-in-memory (SRAM-CIM) has been intensively studied and developed to improve the energy and area efficiency of AI devices. SRAM-CIMs have effectively implemented high integer (INT) precision multiply-and-accumulate (MAC) operations to improve the inference accuracy of various image classification tasks [1]–[3],[5],[6]. To realize more complex AI tasks, such as detection and segmentation, and to support on-chip training for better inference accuracy, floating-point MAC (FP-MAC) operations with high-energy efficiency are required. However, most SRAM-CIMs that previously used digital [5], [6] or analog [1]–[4] in-memory computing cannot effectively support FP-MACs: e.g., Brain Float16 (BF16) datatype. Since supporting high floating-point input (IN), weight (W) and output (OUT) precision for SRAM-CIM may cause (1) inconsistency between the shift-alignment of conventional digital FP-MACs and the structured mapping of most SRAM-CIMs, and (2) results in a more difficult tradeoff between throughput/memory size (T/S), energy efficiency (EF), and memory density (MD), as shown in Fig. 7.2.1.