SRAM-Based Computation in Memory Architecture to Realize Single Command of Add-Multiply Operation and Multifunction

This paper presents a computation in memory (CIM) architecture and circuit design featured with single command to execute addition, signed multiplication, and multi-function to resolve poor computation throughput caused by von Neumann bottleneck. The proposed CIM takes advantage of 2T-Switch circuit which needs only 2 switches to select the required computation units such that the area on silicon is reduced. RCAM (ripple carry adder and multiply) unit realized with full swing gate diffusion input (FS-GDI) in a single-ended disturb- free 7T SRAM further reduces the power consumption and active circuit area. Auto-switching write-back circuit consisting of BL auto-switching circuit, Data switching circuit, and WL auto-switching circuit facilitates the automatic restore of addition and multiplication to designated memory addresses. The proposed CIM is realized using 40-nm CMOS process to demonstrated 12.18/28.19 fJ/bit normalized write/read energy at 100 MHz system clock rate.

[1]  Chua-Chin Wang,et al.  A 4-kb Low-Power SRAM Design With Negative Word-Line Scheme , 2007, IEEE Transactions on Circuits and Systems I: Regular Papers.

[2]  Shaahin Angizi,et al.  Energy Efficient In-Memory Binary Deep Neural Network Accelerator with Dual-Mode SOT-MRAM , 2017, 2017 IEEE International Conference on Computer Design (ICCD).

[3]  Kaushik Roy,et al.  X-SRAM: Enabling In-Memory Boolean Computations in CMOS Static Random Access Memories , 2017, IEEE Transactions on Circuits and Systems I: Regular Papers.

[4]  Chua-Chin Wang,et al.  Disturb-free 5T loadless SRAM cell design with multi-vth transistors using 28 nm CMOS process , 2016, 2016 International SoC Design Conference (ISOCC).

[5]  M. A. Abdelghany,et al.  Low power 4-Bit Arithmetic Logic Unit using Full-Swing GDI technique , 2018, 2018 International Conference on Innovative Trends in Computer Engineering (ITCE).

[6]  John W. Backus,et al.  Can programming be liberated from the von Neumann style?: a functional style and its algebra of programs , 1978, CACM.

[7]  Anand Raghunathan,et al.  Computing in Memory With Spin-Transfer Torque Magnetic RAM , 2017, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[8]  Chua-Chin Wang,et al.  A 4-kB 500-MHz 4-T CMOS SRAM using low-VTHN bitline drivers and high-VTHP latches , 2004, IEEE Trans. Very Large Scale Integr. Syst..

[9]  David Blaauw,et al.  A 28 nm Configurable Memory (TCAM/BCAM/SRAM) Using Push-Rule 6T Bit Cell Enabling Logic-in-Memory , 2016, IEEE Journal of Solid-State Circuits.

[10]  Israel A. Wagner,et al.  Gate-diffusion input (GDI): a power-efficient method for digital combinatorial circuits , 2002, IEEE Trans. Very Large Scale Integr. Syst..

[11]  G. Huang,et al.  An Energy-Efficient Nonvolatile In-Memory Computing Architecture for Extreme Learning Machine by Domain-Wall Nanowire Devices , 2015, IEEE Transactions on Nanotechnology.

[12]  Youchang Kim,et al.  A 17.5-fJ/bit Energy-Efficient Analog SRAM for Mixed-Signal Processing , 2017, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.