Evaluating the effects of single event upsets in soft-core GPGPUs

Graphic Processing Units have become popular in a broad range of applications due to their high computational power and low prices. Among the applications there are the safety critical ones, where fault tolerance is mandatory. This paper presents a fault injection methodology to evaluate a soft-core General Purpose GPU design over Single Event Upsets in its register files. Different configurations of CUDA algorithms are explored to verify their impact on the GPU's behavior during a fault injection campaign. This paper also presents an error characterization analysis by verifying the GPUs memories and program counters in order to evaluate the real impact of the fault in the GPU, even if the fault does not result in an error in the final output of the system. Results can help designers developing fault tolerant techniques in an effective fashion.

[1]  Alan Wood,et al.  The impact of new technology on soft error rates , 2011, 2011 International Reliability Physics Symposium.

[2]  Bo Fang,et al.  GPU-Qin: A methodology for evaluating the error resilience of GPGPU applications , 2014, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[3]  Nian Zhang Investigation of Fault-Tolerant Adaptive Filtering for Noisy ECG Signals , 2007, 2007 IEEE Symposium on Computational Intelligence in Image and Signal Processing.

[4]  Charles Slayman Soft errors — Past history and recent discoveries , 2010, 2010 IEEE International Integrated Reliability Workshop Final Report.

[5]  R. Baumann Soft errors in advanced semiconductor devices-part I: the three radiation sources , 2001 .

[6]  J. Krüger,et al.  Linear algebra operators for GPU implementation of numerical algorithms , 2003, ACM Trans. Graph..

[7]  L. Carro,et al.  GPUs Reliability Dependence on Degree of Parallelism , 2014, IEEE Transactions on Nuclear Science.

[8]  Luigi Carro,et al.  Neutron radiation test of graphic processing units , 2012, 2012 IEEE 18th International On-Line Testing Symposium (IOLTS).

[9]  Russell Tessier,et al.  FlexGrip: A soft GPGPU for FPGAs , 2013, 2013 International Conference on Field-Programmable Technology (FPT).

[10]  Alessandro Strano,et al.  Exploiting structural redundancy of SIMD accelerators for their built-in self-testing/diagnosis and reconfiguration , 2011, ASAP 2011 - 22nd IEEE International Conference on Application-specific Systems, Architectures and Processors.

[11]  Qinglei Hu,et al.  Robust fault-tolerant control for spacecraft attitude stabilisation subject to input saturation , 2011 .

[12]  Luigi Carro,et al.  Threads Distribution Effects on Graphics Processing Units Neutron Sensitivity , 2013, IEEE Transactions on Nuclear Science.

[13]  Lloyd W. Massengill,et al.  Basic mechanisms and modeling of single-event upset in digital microelectronics , 2003 .

[14]  L. Carro,et al.  An Efficient and Experimentally Tuned Software-Based Hardening Strategy for Matrix Multiplication on GPUs , 2013, IEEE Transactions on Nuclear Science.