Memory design for selective error protection

Memory error protection is increasingly important as memory density and capacity continue to scale. This paper presents a memory SEP (Selective Memory Protection) design that enables SEP for commodity memory modules, with no change to the modules or devices. Memory error protection is provided through embedded ECC, a recently proposed, energy-efficient ECC memory organization. The memory SEP design splits the physical memory address space into two memory regions of adjustable sizes, one with error protection and one without. With this support, the OS can adjust the size ratio of the protected region and non-protected region based on the needs of applications. In this scheme, the mapping from a physical memory address to memory device addresses is no longer power-of-two based. New and efficient address mapping schemes based on the Chinese Remainder Mapping are proposed to avoid the use of complex Euclidean division. The simulation results show that the memory SEP design may retain memory performance and cut memory power increase, while providing the ECC protection to commodity memory modules.

[1]  Zhao Zhang,et al.  Mini-rank: Adaptive DRAM architecture for improving memory power efficiency , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[2]  Q. S. Gao The Chinese remainder theorem and the prime memory system , 1993, ISCA '93.

[3]  John L. Henning SPEC CPU2006 benchmark descriptions , 2006, CARN.

[4]  Dae-Hyun Kim,et al.  ArchShield: architectural framework for assisting DRAM scaling by tolerating high error rates , 2013, ISCA.

[5]  Long Chen,et al.  E3CC: A memory error protection scheme with novel address mapping for subranked and low-power memories , 2013, ACM Trans. Archit. Code Optim..

[6]  Haridimos T. Vergos,et al.  High-Speed Parallel-Prefix Modulo 2n-1 Adders , 2000, IEEE Trans. Computers.

[7]  Bruce Jacob,et al.  Memory Systems: Cache, DRAM, Disk , 2007 .

[8]  M. Y. Hsiao,et al.  A class of optimal minimum odd-weight-column SEC-DED codes , 1970 .

[9]  A. Johnston Scaling and Technology Issues for Soft Error Rates , 2000 .

[10]  Eduardo Pinheiro,et al.  DRAM errors in the wild: a large-scale field study , 2009, SIGMETRICS '09.

[11]  Joel S. Emer,et al.  The soft error problem: an architectural perspective , 2005, 11th International Symposium on High-Performance Computer Architecture.

[12]  Doe Hyun Yoon,et al.  Virtualized ECC: Flexible Reliability in Main Memory , 2011, IEEE Micro.

[13]  Aviral Shrivastava,et al.  Mitigating soft error failures for multimedia applications by selective data protection , 2006, CASES '06.

[14]  Reto Zimmermann,et al.  Efficient VLSI implementation of modulo (2/sup n//spl plusmn/1) addition and multiplication , 1999, Proceedings 14th IEEE Symposium on Computer Arithmetic (Cat. No.99CB36336).

[15]  Bianca Schroeder,et al.  Cosmic rays don't strike twice: understanding the nature of DRAM errors and the implications for system design , 2012, ASPLOS XVII.

[16]  Doe Hyun Yoon,et al.  Virtualized and flexible ECC for main memory , 2010, ASPLOS XV.

[17]  Timothy J. Dell,et al.  A white paper on the benefits of chipkill-correct ecc for pc server main memory , 1997 .

[18]  L. Borucki,et al.  Comparison of accelerated DRAM soft error rates measured at component and system level , 2008, 2008 IEEE International Reliability Physics Symposium.

[19]  Robert Baumann,et al.  Soft errors in advanced computer systems , 2005, IEEE Design & Test of Computers.

[20]  Shunfei Chen,et al.  MARSS: A full system simulator for multicore x86 CPUs , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).

[21]  Richard W. Hamming,et al.  Error detecting and error correcting codes , 1950 .

[22]  G. Chen,et al.  Compiler-directed selective data protection against soft errors , 2005, ASP-DAC '05.

[23]  Meng-Hee Teng Comments on "The Prime Memory Systems for Array Access" , 1983, IEEE Trans. Computers.

[24]  Todd M. Austin,et al.  Exploiting selective placement for low-cost memory protection , 2008, TACO.

[25]  Said Boussakta,et al.  Fast Parallel-Prefix Architectures for Modulo 2n-1 Addition with a Single Representation of Zero , 2007, IEEE Transactions on Computers.