An area-efficient 6T-SRAM based Compute-In-Memory architecture with reconfigurable SAR ADCs for energy-efficient deep neural networks in edge ML applications
暂无分享,去创建一个
Compute-In-Memory (CIM) is a promising approach to enable low power Machine Learning (ML) based applications on edge devices, since it significantly reduces data movement by embedding computations inside or near the memory, unlike traditional all-digital implementations. Conventional 6-transistor (6T) SRAM bit-cell based CIM approaches [1]–[3] suffer from bit-cell disturb issue due to accessing multiple cells in a column, limiting the dynamic voltage range allowed for analog dot-product (DP) computations. They are also highly prone to bit-cell discharge current (Icell) variation, degrading the overall accuracy of the neural network (NN) inference. Alternate approaches e.g. [4] requires a custom-designed 10T bitcell which consumes 2-3x larger cell area. To address these challenges, we present an area-efficient CIM approach (CIM-D6T) which uses compact 6T foundry bit-cells while achieving robustness to bit-cell Vt variations and eliminates any read disturb issues, improving the dynamic voltage range for DP. This is achieved by decoupling the 6T cell read from the analog DP computation. As shown in Fig. 1, a pair of extra metal capacitors (Cm) connected to the lines XAp, XAn are added over the SRAM column to store and process the analog voltages for the DP's. The 6T cells in a row are read locally and the read data values are used in the local LRW+MAVa circuit to discharge the analog voltage on the XAp/XAn capacitor to ground. These extra capacitors do not consume additional silicon area since they are implemented as metal comb capacitors over the existing SRAM array using higher metal layers. Fig. 1 shows the overall architecture of the proposed CIM half-array with 256x64 6T bit-cells, split into 16 sub-arrays each with 16 rows and 64 columns. Weights for different 3D filters in a given NN layer (output channel dimension) are mapped to a different sub-array. A group of 2 local columns with 16 rows in each form 1 mux-ed local column (LCOLmx) and hence, each sub-array has 32 parallel ports for input feature map (IFMP) values and the weights. Each LCOLmx along the vertical dimension share a single DAC, which converts a 6-b unsigned digital input (XIN[5:0]) to an analog voltage (0 to Vref). The same analog voltage (Va) is shared across all sub-arrays along a column.