Neutron-Induced Soft Errors in Graphic Processing Units

This paper presents and analyzes the results of neutron experiments on 40nm Graphic Processing Units. We have measured the internal memory resources cross sections, and define a new threads cross section to characterize the computing units sensitivity to radiation. We experimentally evaluate the matrix multiplication application error rate and built an analytical model to predict algorithms neutron-induced failures.

[1]  N. Seifert,et al.  Timing vulnerability factors of sequentials , 2004, IEEE Transactions on Device and Materials Reliability.

[2]  Norbert Seifert,et al.  Radiation-induced Soft Errors: A Chip-level Modeling Perspective , 2010, Found. Trends Electron. Des. Autom..

[3]  Joel Emer,et al.  A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[4]  John D. Owens,et al.  GPU Computing , 2008, Proceedings of the IEEE.

[5]  Erika Cule,et al.  ABC-SysBio—approximate Bayesian computation in Python with GPU support , 2010, Bioinform..

[6]  Charles Slayman Soft errors — Past history and recent discoveries , 2010, 2010 IEEE International Integrated Reliability Workshop Final Report.

[7]  Todd M. Austin,et al.  A Systematic Methodology to Compute the Architectural Vulnerability Factors for a High-Performance Microprocessor , 2003, MICRO.

[8]  Erik Lindholm,et al.  NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.

[9]  Takashi S. Nakamura,et al.  Terrestrial Neutron-Induced Soft Errors in Advanced Memory Devices , 2008 .

[10]  Alan Wood,et al.  The impact of new technology on soft error rates , 2011, 2011 International Reliability Physics Symposium.

[11]  Kevin Skadron,et al.  Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[12]  Rüdiger Westermann,et al.  Linear algebra operators for GPU implementation of numerical algorithms , 2003, SIGGRAPH Courses.

[13]  Paolo Bernardi,et al.  Evaluating Alpha-induced soft errors in embedded microprocessors , 2009, 2009 15th IEEE International On-Line Testing Symposium.

[14]  S. Pontarelli,et al.  A New Hardware/Software Platform and a New 1/E Neutron Source for Soft Error Studies: Testing FPGAs at the ISIS Facility , 2007, IEEE Transactions on Nuclear Science.

[15]  Yao Zhang,et al.  Parallel Computing Experiences with CUDA , 2008, IEEE Micro.

[16]  E. Normand Single event upset at ground level , 1996 .

[17]  Xin Fu,et al.  Analyzing soft-error vulnerability on GPGPU microarchitecture , 2011, 2011 IEEE International Symposium on Workload Characterization (IISWC).