Performance and Energy Efficient Asymmetrically Reliable Caches for Multicore Architectures

Modern architectures are increasingly susceptible to transient and permanent faults due to continuously decreasing transistor sizes and faster operating frequencies. The probability of soft error occurrence is relatively high on cache structures due to the large area of the logic compared to other parts. Applying fault tolerance unselectively for all caches has a significant overhead on performance and energy. In this study, we propose asymmetrically reliable caches aiming to provide required reliability using just enough extra hardware under the performance and energy constraints. In our framework, a chip multiprocessor consists of one reliability-aware core which has ECC protection on its data cache for critical data and a set of less reliable cores with unprotected data caches to map noncritical data. The experimental results for selected applications show that our proposed technique provides 21% better reliability for only 6% more energy consumption compared to traditional caches.

[1]  Olaf Spinczyk,et al.  Generative software-based memory error detection and correction for operating system data structures , 2013, 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[2]  Jie Liu,et al.  Characterizing Application Memory Error Vulnerability to Optimize Datacenter Cost via Heterogeneous-Reliability Memory , 2014, 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[3]  Lorenzo Alvisi,et al.  Modeling the effect of technology trends on the soft error rate of combinational logic , 2002, Proceedings International Conference on Dependable Systems and Networks.

[4]  Aviral Shrivastava,et al.  Cache vulnerability equations for protecting data in embedded processor caches from soft errors , 2010, LCTES '10.

[5]  Chuang Lin,et al.  Improving Multi-Core System Dependability with Asymmetrically Reliable Cores , 2009, 2009 International Conference on Complex, Intelligent and Software Intensive Systems.

[6]  Kunle Olukotun,et al.  The case for a single-chip multiprocessor , 1996, ASPLOS VII.

[7]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[8]  Aviral Shrivastava,et al.  Partially Protected Caches to Reduce Failures Due to Soft Errors in Multimedia Applications , 2009, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[9]  Håkan Grahn,et al.  ParMiBench - An Open-Source Benchmark for Embedded Multiprocessor Systems , 2010, IEEE Computer Architecture Letters.

[10]  Onur Mutlu,et al.  Accelerating critical section execution with asymmetric multi-core architectures , 2009, ASPLOS.

[11]  Ismail Kadayif,et al.  Modeling soft errors for data caches and alleviating their effects on data reliability , 2010, Microprocess. Microsystems.

[12]  John L. Hennessy,et al.  The performance advantages of integrating block data transfer in cache-coherent multiprocessors , 1994, ASPLOS VI.

[13]  Sudhakar M. Reddy,et al.  Cache size selection for performance, energy and reliability of time-constrained systems , 2006, Asia and South Pacific Conference on Design Automation, 2006..

[14]  Norman P. Jouppi,et al.  CACTI: an enhanced cache access and cycle time model , 1996, IEEE J. Solid State Circuits.

[15]  Xin Xu,et al.  Understanding soft error propagation using Efficient vulnerability-driven fault injection , 2012, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012).

[16]  Mahmut T. Kandemir,et al.  Feedback control based cache reliability enhancement for emerging multicores , 2011, 2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[17]  Doe Hyun Yoon,et al.  Virtualized ECC: Flexible Reliability in Main Memory , 2011, IEEE Micro.

[18]  Aviral Shrivastava,et al.  Mitigating soft error failures for multimedia applications by selective data protection , 2006, CASES '06.

[19]  Wei Wu,et al.  Energy-efficient cache design using variable-strength error-correcting codes , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[20]  Wei Wu,et al.  Reducing cache power with low-cost, multi-bit error-correcting codes , 2010, ISCA.

[21]  Aviral Shrivastava,et al.  Partitioning techniques for partially protected caches in resource-constrained embedded systems , 2010, TODE.

[22]  Mateo Valero,et al.  A Data Cache with Multiple Caching Strategies Tuned to Different Types of Locality , 1995, International Conference on Supercomputing.

[23]  Vesselina K. Papazova,et al.  IBM zEnterprise redundant array of independent memory subsystem , 2012, IBM J. Res. Dev..

[24]  Jeffrey T. Draper,et al.  Critical Charge Characterization for Soft Error Rate Modeling in 90nm SRAM , 2007, 2007 IEEE International Symposium on Circuits and Systems.

[25]  Wei Zhang,et al.  Computing cache vulnerability to transient errors and its implication , 2005, 20th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT'05).

[26]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.