SIERRA - Simulation environment for memory redundancy algorithms

Extreme-scale computer systems take advantage of large arrays of general-purpose multicore processors coupled with specialized manycore accelerators. In order to support complex applications and correctly feed such processing elements, increasingly larger memory cores are integrated at different levels of the hierarchy. However, the adoption of increasingly aggressive manufacturing processes makes the memory sub-system particularly sensitive to faults. Error correcting codes (ECCs) allow the memory to recover from faults at run-time without interfering with the application execution. However, due to the loss of performance introduced every time an error must be corrected, the persistence of faults requires a more radical repair approach in which faulty cells are physically replaced by spare ones. Memory redundancy analysis (MRA) algorithms are used to drive the allocation process of spare resources. Many one-dimensional and two-dimensional MRAs have been proposed, but tools for evaluating their recovering capability are still not well established. This paper presents SIERRA, a simulation environment for precisely evaluating the repair efficiency of an MRA considering different fault signatures and faulty memory configurations. Our simulation engine provides a precise estimation of the MRA quality by analyzing the behavior of the MRA on several faulty memory configurations. To this end, different parameters such as the area of the memory blocks and the defect density are taken into account. The evaluation of the quality of an MRA takes into account its repairing capability, the power consumption derived from its execution, and the area overhead. Thanks to the use of a database for storing information, our tool is able to speed-up the simulation process by distributing it among several nodes. All these features make SIERRA essential in supporting the design of next-generation high-performance computers.

[1]  Fabrizio Lombardi,et al.  Reliability measurement of fault-tolerant onboard memory system under fault clustering , 2002, IMTC/2002. Proceedings of the 19th IEEE Instrumentation and Measurement Technology Conference (IEEE Cat. No.00CH37276).

[2]  Jin-Fu Li,et al.  Memory Built-in Self-Repair Planning Framework for RAMs in SoCs , 2011, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[3]  Ching-Yu Chin,et al.  Mathematical yield estimation for two-dimensional-redundancy memory arrays , 2010, 2010 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[4]  Franck Cappello,et al.  Addressing failures in exascale computing , 2014, Int. J. High Perform. Comput. Appl..

[5]  Jin-Fu Li,et al.  A simulator for evaluating redundancy analysis algorithms of repairable embedded memories , 2002, Proceedings of the 2002 IEEE International Workshop on Memory Technology, Design and Testing (MTDT2002).

[6]  Dan Alexandrescu,et al.  INFORMER: An integrated framework for early-stage memory robustness analysis , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[7]  Jin-Fu Li,et al.  Raisin: Redundancy Analysis Algorithm Simulation , 2007, IEEE Design & Test of Computers.

[8]  R.J. McPartland,et al.  SRAM embedded memory with low cost, flash EEPROM-switch-controlled redundancy , 2000, Proceedings of the IEEE 2000 Custom Integrated Circuits Conference (Cat. No.00CH37044).

[9]  Paolo Prinetto,et al.  Automating defects simulation and fault modeling for SRAMs , 2008, 2008 IEEE International High Level Design Validation and Test Workshop.

[10]  Cheng-Wen Wu,et al.  Raisin: Redundancy Analysis Algorithm Simulation , 2007 .

[11]  Keiichi Higeta,et al.  Built-in self-test for GHz embedded SRAMs using flexible pattern generator and new repair algorithm , 1999, International Test Conference 1999. Proceedings (IEEE Cat. No.99CH37034).

[12]  Gianfranco Politano,et al.  Genetic Defect Based March Test Generation for SRAM , 2011, EvoApplications.

[13]  W. Kent Fuchs,et al.  Fault Diagnosis and Spare Allocation for Yield Enhancement in Large Reconfigurable PLA's , 1992, IEEE Trans. Computers.

[14]  Alfredo Benso,et al.  March Test Generation Revealed , 2008, IEEE Transactions on Computers.

[15]  Alfredo Benso,et al.  Automatic March tests generation for static and dynamic faults in SRAMs , 2005, European Test Symposium (ETS'05).

[16]  C. H. Stapper,et al.  Yield Model for Productivity Optimization of VLSI Memory Chips with Redundancy and Partially Good Product , 1980, IBM J. Res. Dev..

[17]  Paolo Prinetto,et al.  8T SRAM Defective Cell with Open Defects , 2010 .

[18]  Hideto Hidaka,et al.  A built-in self-repair analyzer (CRESTA) for embedded DRAMs , 2000, Proceedings International Test Conference 2000 (IEEE Cat. No.00CH37159).

[19]  W. Kent Fuchs,et al.  Efficient Spare Allocation for Reconfigurable Arrays , 1987 .

[20]  Yervant Zorian,et al.  An approach for evaluation of redundancy analysis algorithms , 2001, Proceedings 2001 IEEE International Workshop on Memory Technology, Design and Testing.

[21]  Cheng-Wen Wu,et al.  RAMSES: a fast memory fault simulator , 1999, Proceedings 1999 IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (EFT'99).

[22]  Chin-Long Wey,et al.  On the Repair of Redundant RAM's , 1987, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[23]  Jin-Fu Li,et al.  A built-in self-test and self-diagnosis scheme for embedded SRAM , 2000, Proceedings of the Ninth Asian Test Symposium.

[24]  Alessandro Savino,et al.  Influence of Parasitic Capacitance Variations on 65 nm and 32 nm Predictive Technology Model SRAM Core-Cells , 2008, 2008 17th Asian Test Symposium.

[25]  Hans-Joachim Wunderlich,et al.  Analyzing Test and Repair Times for 2D Integrated Memory Built-in Test and Repair , 2007, 2007 IEEE Design and Diagnostics of Electronic Circuits and Systems.

[26]  Sy-yen Kuo,et al.  Efficient Spare Allocation for Reconfigurable Arrays , 1987, IEEE Design & Test of Computers.

[27]  Hans-Joachim Wunderlich,et al.  An Integrated Built-In Test and Repair Approach for Memories with 2D Redundancy , 2007, 12th IEEE European Test Symposium (ETS'07).

[28]  Alfredo Benso,et al.  Specification and design of a new memory fault simulator , 2002, Proceedings of the 11th Asian Test Symposium, 2002. (ATS '02)..

[29]  Lorena Anghel,et al.  Evaluation of memory built-in self repair techniques for high defect density technologies , 2004, 10th IEEE Pacific Rim International Symposium on Dependable Computing, 2004. Proceedings..

[30]  Moinuddin K. Qureshi,et al.  FaultSim: A Fast, Configurable Memory-Reliability Simulator for Conventional and 3D-Stacked Systems , 2016, ACM Trans. Archit. Code Optim..

[31]  Jin-Fu Li,et al.  A built-in self-repair scheme for semiconductor memories with 2-d redundancy , 2003, International Test Conference, 2003. Proceedings. ITC 2003..

[32]  Jin-Fu Li,et al.  Built-in redundancy analysis for memory yield improvement , 2003, IEEE Trans. Reliab..

[33]  W. Kent Fuchs,et al.  Efficient Spare Allocation in Reconfigurable Arrays , 1986, 23rd ACM/IEEE Design Automation Conference.

[34]  Shyue-Kung Lu,et al.  Built-In self-repair for divided word line memory , 2001, ISCAS 2001. The 2001 IEEE International Symposium on Circuits and Systems (Cat. No.01CH37196).

[35]  Erik Jan Marinissen,et al.  Redundancy modelling and array yield analysis for repairable embedded memories , 2005 .

[36]  Cheng-Wen Wu,et al.  Defect oriented fault analysis for SRAM , 2003, 2003 Test Symposium.

[37]  John Shalf,et al.  The International Exascale Software Project roadmap , 2011, Int. J. High Perform. Comput. Appl..

[38]  Said Hamdioui,et al.  Detecting faults in the peripheral circuits and an evaluation of SRAM tests , 2004, 2004 International Conferce on Test.

[39]  Bruce Jacob The 2 PetaFLOP, 3 Petabyte, 9 TB/s, 90 kW Cabinet: A System Architecture for Exascale and Big Data , 2016, IEEE Computer Architecture Letters.

[40]  Shyue-Kung Lu,et al.  Efficient built-in redundancy analysis for embedded memories with 2-D redundancy , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[41]  Cheng-Wen Wu,et al.  Economic Aspects of Memory Built-in Self-Repair , 2007, IEEE Design & Test of Computers.

[42]  V. K. Agarwal,et al.  Built-in self-diagnosis for repairable embedded RAMs , 1993, IEEE Design & Test of Computers.

[43]  Jin-Fu Li,et al.  A built-in redundancy-analysis scheme for RAMs with 2D redundancy using 1D local bitmap , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[44]  Dilip K. Bhavsar An algorithm for row-column self-repair of RAMs and its implementation in the Alpha 21264 , 1999, International Test Conference 1999. Proceedings (IEEE Cat. No.99CH37034).

[45]  Said Hamdioui,et al.  Detecting faults in the peripheral circuits and an evaluation of SRAM tests , 2004 .

[46]  John Day A Fault-Driven, Comprehensive Redundancy Algorithm , 1985, IEEE Design & Test of Computers.

[47]  Jin-Fu Li,et al.  A Reconfigurable Built-In Self-Repair Scheme for Multiple Repairable RAMs in SOCs , 2006, 2006 IEEE International Test Conference.

[48]  N. Park,et al.  Repair of memory arrays by cutting , 1998, Proceedings. International Workshop on Memory Technology, Design and Testing (Cat. No.98TB100236).