Soft error-aware architectural exploration for designing reliability adaptive cache hierarchies in multi-cores

Mainstream multi-core processors employ large multilevel on-chip caches making them highly susceptible to soft errors. We demonstrate that designing a reliable cache hierarchy requires understanding the vulnerability interdependencies across different cache levels. This involves vulnerability analyses depending upon the parameters of different cache levels (partition size, line size, etc.) and the corresponding cache access patterns for different applications. This paper presents a novel soft error-aware cache architectural space exploration methodology and vulnerability analysis of multi-level caches considering their vulnerability interdependencies. Our technique significantly reduces exploration time while providing reliability-efficient cache configurations. We also show applicability/benefits for ECC-protected caches under multi-bit fault scenarios.

[1]  Wei Wu,et al.  Energy-efficient cache design using variable-strength error-correcting codes , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[2]  Muhammad Shafique,et al.  R2Cache: Reliability-aware reconfigurable last-level cache architecture for multi-cores , 2015, 2015 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[3]  Aviral Shrivastava,et al.  Enabling energy efficient reliability in embedded systems through smart cache cleaning , 2013, ACM Trans. Design Autom. Electr. Syst..

[4]  David A. Patterson,et al.  A hardware evaluation of cache partitioning to improve utilization and energy-efficiency while preserving responsiveness , 2013, ISCA.

[5]  Frank Vahid,et al.  A highly configurable cache architecture for embedded systems , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[6]  Michel Dubois,et al.  Soft error benchmarking of L2 caches with PARMA , 2011, SIGMETRICS 2011.

[7]  Wei Zhang,et al.  Computing cache vulnerability to transient errors and its implication , 2005, 20th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT'05).

[8]  Mehdi Baradaran Tahoori,et al.  Vulnerability Analysis of L2 Cache Elements to Single Event Upsets , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[9]  Sri Parameswaran,et al.  Hardware-based fast exploration of cache hierarchies in application specific MPSoCs , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[10]  Zeshan Chishti,et al.  Operating SECDED-based caches at ultra-low voltage with FLAIR , 2013, 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[11]  Sri Parameswaran,et al.  Finding optimal L1 cache configuration for embedded systems , 2006, Asia and South Pacific Conference on Design Automation, 2006..

[12]  Muhammad Shafique,et al.  Reliability-Aware Adaptations for Shared Last-Level Caches in Multi-Cores , 2016, ACM Trans. Embed. Comput. Syst..

[13]  Sudhakar M. Reddy,et al.  Cache size selection for performance, energy and reliability of time-constrained systems , 2006, Asia and South Pacific Conference on Design Automation, 2006..

[14]  Antonio González,et al.  A dynamically reconfigurable cache for multithreaded processors , 2006, J. Embed. Comput..

[15]  David R. Kaeli,et al.  Calculating Architectural Vulnerability Factors for Spatial Multi-Bit Transient Faults , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[16]  Bill Moyer,et al.  A low power unified cache architecture providing power and performance flexibility , 2000, ISLPED'00: Proceedings of the 2000 International Symposium on Low Power Electronics and Design (Cat. No.00TH8514).

[17]  Sanjay Ranka,et al.  Dynamic cache reconfiguration and partitioning for energy optimization in real-time multi-core systems , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).

[18]  Joel Emer,et al.  A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[19]  Shuai Wang,et al.  On the Characterization and Optimization of On-Chip Cache Reliability against Soft Errors , 2009, IEEE Transactions on Computers.

[20]  Timothy M. Jones,et al.  RECAP: Region-Aware Cache Partitioning , 2013, 2013 IEEE 31st International Conference on Computer Design (ICCD).

[21]  Nikil D. Dutt,et al.  Automatic tuning of two-level caches to embedded applications , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[22]  E. Ibe,et al.  Impact of Scaling on Neutron-Induced Soft Error in SRAMs From a 250 nm to a 22 nm Design Rule , 2010, IEEE Transactions on Electron Devices.

[23]  Yale N. Patt,et al.  Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[24]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[25]  Ann Gordon-Ross,et al.  A Cache Tuning Heuristic for Multicore Architectures , 2013, IEEE Transactions on Computers.

[26]  R.C. Baumann,et al.  Radiation-induced soft errors in advanced semiconductor technologies , 2005, IEEE Transactions on Device and Materials Reliability.

[27]  Frank Vahid,et al.  A self-tuning cache architecture for embedded systems , 2004 .

[28]  Weixun Wang,et al.  Dynamic Reconfiguration of Two-Level Cache Hierarchy in Real-Time Embedded Systems , 2011, J. Low Power Electron..