Enabling energy efficient reliability in embedded systems through smart cache cleaning

Incessant and rapid technology scaling has brought us to a point where today's, and future transistors are susceptible to transient errors induced by energy carrying particles, called soft errors. Within a processor, the sheer size and nature of data in the caches render it most vulnerable to electrical interference on data stored in the cache. Data in the cache is vulnerable to corruption by soft errors, for the time it remains actively unused in the cache. Write-through and early-write-back [Li et al. 2004] cache configurations reduce the time for vulnerable data in the cache, at the cost of increased memory writes and thereby energy. We propose a smart cache cleaning methodology, that enables copying of only specific vulnerable cache blocks into the memory at chosen times, thereby ensuring data cache protection with minimal memory writes. In this work, we first propose a hybrid (software-hardware) methodology. We then propose an improved software solution that utilizes cache write-back functionality available in commodity processors; thereby reducing the hardware overhead required to implement smart cache cleaning for such systems. The parameters involved in the implementation of our Smart Cache Cleaning (SCC) technique enable a means to provide for customizable energy-efficient soft error reduction in the L1 data cache. Given the system requirements of reliability, power-budget and runtime priority of the application, appropriate parameters of the SCC can be customized to trade-off power consumption and L1 data cache reliability. Our experiments over LINPACK and Livermore benchmarks demonstrate 26% reduced energy-vulnerability product (energy-efficient vulnerability reduction) compared to that of hardware based cache reliability techniques. Our software-only solution achieves same levels of reliability with an additional 28% performance improvement.

[1]  Tryggve Fossum,et al.  Cache scrubbing in microprocessors: myth or necessity? , 2004, 10th IEEE Pacific Rim International Symposium on Dependable Computing, 2004. Proceedings..

[2]  Wei Zhang,et al.  Computing and Minimizing Cache Vulnerability to Transient Errors , 2009, IEEE Design & Test of Computers.

[3]  Aviral Shrivastava,et al.  Partially Protected Caches to Reduce Failures Due to Soft Errors in Multimedia Applications , 2009, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[4]  Hai Zhou,et al.  Parallel CAD: Algorithm Design and Programming Special Section Call for Papers TODAES: ACM Transactions on Design Automation of Electronic Systems , 2010 .

[5]  Jeffrey T. Draper,et al.  Critical Charge Characterization for Soft Error Rate Modeling in 90nm SRAM , 2007, 2007 IEEE International Symposium on Circuits and Systems.

[6]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[7]  R. Baumann,et al.  Boron compounds as a dominant source of alpha particles in semiconductor devices , 1995, Proceedings of 1995 IEEE International Reliability Physics Symposium.

[8]  Mehdi Baradaran Tahoori,et al.  Reducing Data Cache Susceptibility to Soft Errors , 2006, IEEE Transactions on Dependable and Secure Computing.

[9]  Shuichi Sakai,et al.  Utilization of SECDED for Soft Error and Variation-Induced Defect Tolerance in Caches , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.

[10]  Shuichi Sakai,et al.  Utilization of SECDED for soft error and variation-induced defect tolerance in caches , 2007 .

[11]  E. Cannon,et al.  SRAM SER in 90, 130 and 180 nm bulk and SOI technologies , 2004, 2004 IEEE International Reliability Physics Symposium. Proceedings.

[12]  Mahmut T. Kandemir,et al.  Soft error and energy consumption interactions: a data cache perspective , 2004, Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758).

[13]  Ed Anderson,et al.  LAPACK Users' Guide , 1995 .

[14]  Richard W. Hamming,et al.  Error detecting and error correcting codes , 1950 .

[15]  Aviral Shrivastava,et al.  Partitioning techniques for partially protected caches in resource-constrained embedded systems , 2010, TODE.

[16]  Changhong Dai,et al.  Impact of CMOS process scaling and SOI on the soft error rates of logic processes , 2001, 2001 Symposium on VLSI Technology. Digest of Technical Papers (IEEE Cat. No.01 CH37184).

[17]  Aviral Shrivastava,et al.  Cache vulnerability equations for protecting data in embedded processor caches from soft errors , 2010, LCTES '10.

[18]  E. Ibe,et al.  Impact of Scaling on Neutron-Induced Soft Error in SRAMs From a 250 nm to a 22 nm Design Rule , 2010, IEEE Transactions on Electron Devices.

[19]  G. Chen,et al.  Compiler-directed selective data protection against soft errors , 2005, ASP-DAC '05.

[20]  T. May,et al.  Alpha-particle-induced soft errors in dynamic memories , 1979, IEEE Transactions on Electron Devices.

[21]  Jin-Fu Li,et al.  An error detection and correction scheme for RAMs with partial-write function , 2005, 2005 IEEE International Workshop on Memory Technology, Design, and Testing (MTDT'05).

[22]  David Seal,et al.  ARM Architecture Reference Manual , 2001 .

[23]  Jr. Leonard R. Rockett Simulated SEU hardened scaled CMOS SRAM cell design using gated resistors , 1992 .

[24]  Joel Emer,et al.  A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[25]  ShrivastavaAviral,et al.  Enabling energy efficient reliability in embedded systems through smart cache cleaning , 2013 .

[26]  Aviral Shrivastava,et al.  Compilation techniques for energy reduction in horizontally partitioned cache architectures , 2005, CASES '05.

[27]  Sammy Kayali Reliability consideration for advanced microelectronics , 2000, Proceedings. 2000 Pacific Rim International Symposium on Dependable Computing.