Computational Failure Analysis of In-Memory RRAM Architecture for Pattern Classification CNN Circuits

Power-efficient data processing subsystems performing millions of complex concurrent arithmetic operations per second form part of today’s essential solution required to meet the growing demand of edge computing applications, given the volume of data collected by real-time Internet-Of-Things (IoT) sensors. Adding to it, the in-memory computation designed as memory and processing elements on a single wafer has enabled promising performance improvement in terms of computational power savings by avoiding the memory wall created while accessing the memory array. The Resistive RAM (RRAM), with its simple metal-insulator-metal (MIM) structure, proves to be a very appealing candidate for in-memory computation given its ultralow switching power and its Complementary Metal Oxide Semiconductor (CMOS) process fabrication compatibility. However, despite all advantages, the resistive switching (RS) phenomenon in RRAM has an inherent stochastic variability. On the algorithmic side, convolution neural networks (CNN) have gained popularity in image classification applications, and the network’s architecture is memory-intense in nature for memorizing the trained weights. Hence, an RRAM-based CNN system will pave way for a power-efficient image classification system on the edge. Accounting however for the inherent variability in RRAM (inter-device and intra-device), the accuracy of CNN’s prediction is surely expected to drop. This motivates us to quantify the impact of RRAM variability on the CNN trained weights and classification accuracy (prediction loss). In this study, we have constructed a Look-Up-Table (LUT) based model for encoding wide current compliance (<inline-formula> <tex-math notation="LaTeX">$2~\mu \text{A}$ </tex-math></inline-formula> to <inline-formula> <tex-math notation="LaTeX">$250~\mu \text{A}$ </tex-math></inline-formula>) 65nm CMOS 1T1R OxRAM’s (TiN/HfO<sub>2</sub>/Hf/TiN) resistive variability into CNN’s trained weight in a digital regime. The RRAM resistance encoded trained weights are in turn used here to simulate the two extreme CNN architectures, namely, Fully Serial System (FSS) and Fully Parallel System (FPS). The architectures’ prediction variability trends are quantified given its current compliance, RRAM resistive variability, CNN’s convolution matrix sizes (<inline-formula> <tex-math notation="LaTeX">$5\times 5$ </tex-math></inline-formula>, <inline-formula> <tex-math notation="LaTeX">$3\times 3$ </tex-math></inline-formula>, <inline-formula> <tex-math notation="LaTeX">$1\times 1$ </tex-math></inline-formula>, and <inline-formula> <tex-math notation="LaTeX">$1\times 1$ </tex-math></inline-formula> max pool), the total number of layers in the CNN as well as the input image pixel size.