Reliability in the Shadow of Long-Stall Instructions

Soft errors due to cosmic rays are now a major concern for both computer manufacturers and end users. Due to continually shrinking silicon manufacturing processes and greater chip integration, the chipand system-level soft error rate is projected to continue increasing for the forseeable future. Du e to these concerns, chip manufacturers have long designed cache structures (generally the largest on-chip structures in a highperformance processor) to include protection techniques such a s parity or ECC. In order to meet SER targets, new chip designs are starting to incorporate such protection on physical register files. As the soft error rate increases, however, large processo r pipeline structures will also require protection against soft errors. Unfortunately, many of these pipeline structures are latencycritical, needing to complete multiple accesses per processor cycle. Thus, many of these structures are ill-suited for protectio n techniques such as ECC, which can add latency to each access. In modern processors, these structures typically contain inflight instructions, which can vary in their vulnerability contribution. Thus, uniform protection such as that provided by ECC may not be necessary. Our goal is to explore and explain some of the underlying causes of this variation in vulnerability among instructions. In this study, we examine the vulnerability contribution of instructions that are in-flight during (in the shadow of) long-stall instructions, which will define the maximum potential benefit of techniques that exploit these stall cycles.

[1]  Joel Emer,et al.  A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[2]  Kunle Olukotun,et al.  Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.

[3]  Tao Li,et al.  Characterizing Microarchitecture Soft Error Vulnerability Phase Behavior , 2006, 14th IEEE International Symposium on Modeling, Analysis, and Simulation.

[4]  Richard E. Kessler,et al.  The Alpha 21264 microprocessor architecture , 1998, Proceedings International Conference on Computer Design. VLSI in Computers and Processors (Cat. No.98CB36273).

[5]  Arijit Biswas,et al.  Computing architectural vulnerability factors for address-based structures , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[6]  R.C. Baumann,et al.  Radiation-induced soft errors in advanced semiconductor technologies , 2005, IEEE Transactions on Device and Materials Reliability.

[7]  Stefan Rusu,et al.  Itanium 2 processor 6M: higher frequency and larger L3 cache , 2004, IEEE Micro.

[8]  Brad Calder,et al.  Picking statistically valid and early simulation points , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.

[9]  Shubhendu S. Mukherjee,et al.  Transient fault detection via simultaneous multithreading , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[10]  Ronald G. Dreslinski,et al.  The M5 Simulator: Modeling Networked Systems , 2006, IEEE Micro.

[11]  Joel S. Emer,et al.  Techniques to reduce the soft error rate of a high-performance microprocessor , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[12]  T. N. Vijaykumar,et al.  Opportunistic Transient-Fault Detection , 2006, IEEE Micro.

[13]  Greg Grohoski Niagara-2: A highly threaded server-on-a-chip , 2006, 2006 IEEE Hot Chips 18 Symposium (HCS).

[14]  Gurindar S. Sohi,et al.  Dynamic dead-instruction detection and elimination , 2002, ASPLOS X.

[15]  Mehdi Baradaran Tahoori,et al.  Balancing Performance and Reliability in the Memory Hierarchy , 2005, IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005..

[16]  Balaram Sinharoy,et al.  IBM Power5 chip: a dual-core multithreaded processor , 2004, IEEE Micro.

[17]  Babak Falsafi,et al.  Fingerprinting: bounding soft-error-detection latency and bandwidth , 2004, IEEE Micro.

[18]  Timothy J. Slegel,et al.  IBM's S/390 G5 microprocessor design , 1999, IEEE Micro.