RRAM-Based Neuromorphic Hardware Reliability Improvement by Self-Healing and Error Correction

Neural network (NN) has been considered as an important factor for the success of many AI applications. As the von Neumann architecture is inefficient for NN computation, researchers have been investigating new semiconductor devices and architectures for neuromorphic computing. The crossbar RRAM, which is an emerging non-volatile memory composed of memristor devices, can be used to accelerate or emulate the NN computation. However, the memristor device defects exposed during manufacturing or field use may cause performance degradation in the NN, causing reliability issues to the neuromorphic hardware. In this paper, we consider two existing fault models for the 1T1R RRAM cell, i.e., the stuck-at fault and transistor stuck-on fault. Evaluation of their influence to the NN shows that for about 10% faulty cells in the memristor array, the accuracy for the MLP model degrades about 10%, and that for the LeNet 300-100 and LeNet 5 degrades by more than 65%. Therefore, we propose a self-healing and an error correction approach to reduce the accuracy degradation, and improve the reliability (lifetime) of the neuromorphic hardware. Our simulation results show that if we limit the accuracy degradation to within 5%, then the proposed error correction approach for the MLP model will be able to tolerate up to 40% faulty cells, and even up to 60% faulty cells for LeNet 300-100 and LetNet 5 models. Also, the error correction method can extend the lifetime of the neuromorphic hardware by 5% or more.

[1]  Yiran Chen,et al.  Accelerator-friendly neural-network training: Learning variations and defects in RRAM crossbar , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[2]  R. Kraemer,et al.  Resistive switching behavior in TiN/HfO2/Ti/TiN devices , 2012, 2012 International Semiconductor Conference Dresden-Grenoble (ISCDG).

[3]  Tao Zhang,et al.  PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[4]  Xuefei Ning,et al.  Fault-tolerant training with on-line fault detection for RRAM-based neural computing systems , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[5]  Antonio Rubio,et al.  Memristive Crossbar Memory Lifetime Evaluation and Reconfiguration Strategies , 2018, IEEE Transactions on Emerging Topics in Computing.

[6]  Miao Hu,et al.  ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[7]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[8]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[9]  Catherine D. Schuman,et al.  A Survey of Neuromorphic Computing and Neural Networks in Hardware , 2017, ArXiv.

[10]  Frederick T. Chen,et al.  RRAM Defect Modeling and Failure Analysis Based on March Test and a Novel Squeeze-Search Scheme , 2015, IEEE Transactions on Computers.

[11]  Tarek M. Taha,et al.  Neuromemristive Systems: Boosting Efficiency through Brain-Inspired Computing , 2016, Computer.

[12]  Yu Wang,et al.  Computation-oriented fault-tolerance schemes for RRAM computing systems , 2017, 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC).

[13]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).