Stuck-at Fault Tolerance in RRAM Computing Systems

Emerging metal-oxide resistive switching random-access memory (RRAM) devices and RRAM crossbars have demonstrated their potential in boosting the speed and energy-efficiency of analog matrix-vector multiplication. However, due to the immature fabrication technology, commonly occurring Stuck-At-Faults (SAFs) seriously degrade the computational accuracy of an RRAM-based computing system (RCS). In this paper, we present a fault-tolerant framework for RCS. A mapping algorithm with inner fault tolerance is proposed to convert matrix parameters into RRAM conductances in RCS and tolerate SAFs by fully exploring the available mapping space. Two baseline redundancy schemes are proposed to ensure that RCS is effective when the percentage of faulty RRAM cells is high. To reduce the number of redundant RRAM cells when the SAFs follow a non-uniform distribution or an unknown distribution, a distribution-aware redundancy scheme and a re-configurable redundancy scheme are proposed to provide dynamic fault tolerance. Simulation results show that, the baseline redundancy schemes can improve the recognition accuracy of the MNIST data set to almost the same as the RRAM-fault-free case, with an energy overhead of approximately 30%. When SAFs follow a non-uniform and an unknown distribution, the distribution-aware and re-configurable schemes can reduce the number of redundant RRAM cells from more than 200% to less than 40% and 60%, respectively, without reducing the recognition accuracy.

[1]  Yu Wang,et al.  Technological exploration of RRAM crossbar array for matrix-vector multiplication , 2015, ASP-DAC.

[2]  A. Prakash,et al.  Improved resistance memory characteristics and switching mechanism using TiN electrode on TaOx/W structure , 2013, 2013 IEEE 5th International Nanoelectronics Conference (INEC).

[3]  Yuan Xie,et al.  i2WAP: Improving non-volatile cache lifetime by reducing inter- and intra-set write variations , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[4]  Sachhidh Kannan,et al.  Detection, diagnosis, and repair of faults in memristor-based memories , 2014, 2014 IEEE 32nd VLSI Test Symposium (VTS).

[5]  L. Goux,et al.  Causes and consequences of the stochastic aspect of filamentary RRAM , 2015 .

[6]  Yu Wang,et al.  RRAM-Based Analog Approximate Computing , 2015, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[7]  He Qian,et al.  Resisitive switching variability study on 1T1R AlOx/WOx-based RRAM array , 2013, 2013 IEEE International Conference of Electron Devices and Solid-state Circuits.

[8]  Shimeng Yu,et al.  Metal–Oxide RRAM , 2012, Proceedings of the IEEE.

[9]  Chenchen Liu,et al.  Rescuing memristor-based neuromorphic design with high defects , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[10]  C. W. Liu,et al.  Physical mechanism of HfO2-based bipolar resistive random access memory , 2011, Proceedings of 2011 International Symposium on VLSI Technology, Systems and Applications.

[11]  Miao Hu,et al.  ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[12]  Catherine Graves,et al.  Dot-product engine for neuromorphic computing: Programming 1T1M crossbar to accelerate matrix-vector multiplication , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[13]  Gaetano Palumbo,et al.  Design Procedures for Three-Stage CMOS OTAs With Nested-Miller Compensation , 2007, IEEE Transactions on Circuits and Systems I: Regular Papers.

[14]  Xuefei Ning,et al.  Fault-tolerant training with on-line fault detection for RRAM-based neural computing systems , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[15]  Hao Jiang,et al.  RENO: A high-efficient reconfigurable neuromorphic computing accelerator design , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[16]  Cheng-Wen Wu,et al.  Training-based forming process for RRAM yield improvement , 2011, 29th VLSI Test Symposium.

[17]  Yu Wang,et al.  ICE: Inline calibration for memristor crossbar-based computing engine , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[18]  Yu Wang,et al.  Switched by input: Power efficient structure for RRAM-based convolutional neural network , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[19]  Shimeng Yu,et al.  MNSIM: Simulation platform for memristor-based neuromorphic computing system , 2016, DATE 2016.

[20]  Frederick T. Chen,et al.  RRAM Defect Modeling and Failure Analysis Based on March Test and a Novel Squeeze-Search Scheme , 2015, IEEE Transactions on Computers.

[21]  Tao Zhang,et al.  PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[22]  L. Goux,et al.  Understanding of the endurance failure in scaled HfO2-based 1T1R RRAM through vacancy mobility degradation , 2012, 2012 International Electron Devices Meeting.

[23]  Hung-Yau Lin,et al.  An efficient algorithm for spare allocation problems , 2006, IEEE Transactions on Reliability.

[24]  Yiran Chen,et al.  Memristor Crossbar-Based Neuromorphic Computing System: A Case Study , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[25]  G Pfurtscheller,et al.  Real-time EEG analysis with subject-specific spatial patterns for a brain-computer interface (BCI). , 2000, IEEE transactions on rehabilitation engineering : a publication of the IEEE Engineering in Medicine and Biology Society.

[26]  Yu Wang,et al.  Energy Efficient RRAM Spiking Neural Network for Real Time Classification , 2015, ACM Great Lakes Symposium on VLSI.

[27]  Mikhail S. Tarkov,et al.  Mapping neural network computations onto memristor crossbar , 2015, 2015 International Siberian Conference on Control and Communications (SIBCON).

[28]  Yiran Chen,et al.  Accelerator-friendly neural-network training: Learning variations and defects in RRAM crossbar , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.