Precise Cache Profiling for Studying Radiation Effects

Increased access to space has led to an increase in the usage of commodity processors in radiation environments. These processors are vulnerable to transient faults such as single event upsets that may cause bit-flips in processor components. Caches in particular are vulnerable due to their relatively large area, yet are often omitted from fault injection testing because many processors do not provide direct access to cache contents and they are often not fully modeled by simulators. The performance benefits of caches make disabling them undesirable, and the presence of error correcting codes is insufficient to correct for increasingly common multiple bit upsets. This work explores building a program’s cache profile by collecting cache usage information at an instruction granularity via commonly available on-chip debugging interfaces. The profile provides a tighter bound than cache utilization for cache vulnerability estimates (50% for several benchmarks). This can be applied to reduce the number of fault injections required to characterize behavior by at least two-thirds for the benchmarks we examine. The profile enables future work in hardware fault injection for caches that avoids the biases of existing techniques.

[1]  Nicholas Nethercote,et al.  Dynamic Binary Analysis and Instrumentation , 2004 .

[2]  Karthik Pattabiraman,et al.  Modeling Soft-Error Propagation in Programs , 2018, 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[3]  Mario García-Valderas,et al.  Soft Error Sensitivity Evaluation of Microprocessors by Multilevel Emulation-Based Fault Injection , 2012, IEEE Transactions on Computers.

[4]  Gabriel L. Nazar,et al.  Reducing embedded software radiation-induced failures through cache memories , 2014, 2014 19th IEEE European Test Symposium (ETS).

[5]  Heather Quinn,et al.  A Method and Case Study on Identifying Physically Adjacent Multiple-Cell Upsets Using 28-nm, Interleaved and SECDED-Protected Arrays , 2014, IEEE Transactions on Nuclear Science.

[6]  Michael Engel,et al.  Fast and Low-Cost Instruction-Aware Fault Injection , 2013, GI-Jahrestagung.

[7]  Karthik Pattabiraman,et al.  Quantifying the Accuracy of High-Level Fault Injection Techniques for Hardware Faults , 2014, 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[8]  Johan Karlsson,et al.  One Bit is (Not) Enough: An Empirical Study of the Impact of Single and Multiple Bit-Flip Errors , 2017, 2017 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[9]  Alan D. George,et al.  Cache fault injection with DrSEUs , 2018, 2018 IEEE Aerospace Conference.

[10]  Alan D. George,et al.  Onboard Processing With Hybrid and Reconfigurable Computing on Small Satellites , 2018, Proceedings of the IEEE.

[11]  Kenneth L. Lebsock,et al.  GN&C Engineering Best Practices For Human-Rated Spacecraft Systems , 2007 .

[12]  James MacKinnon,et al.  CSP Hybrid Space Computing for STP-H5/ISEM on ISS , 2015 .

[13]  Mehdi Baradaran Tahoori,et al.  Vulnerability Analysis of L2 Cache Elements to Single Event Upsets , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[14]  Riki Munakata,et al.  LightSail Program Status: One Down, One to Go , 2015 .

[15]  P. Morris,et al.  Large-Scale Multiple Cell Upsets in 90 nm Commercial SRAMs During Neutron Irradiation , 2012, IEEE Transactions on Nuclear Science.

[16]  Christian Dietrich,et al.  dOSEK: the design and implementation of a dependability-oriented static embedded kernel , 2015, 21st IEEE Real-Time and Embedded Technology and Applications Symposium.

[17]  L. Scheick,et al.  Juno radiation design and implementation , 2012, 2012 IEEE Aerospace Conference.

[18]  Tipp Moseley,et al.  PLR: A Software Approach to Transient Fault Tolerance for Multicore Architectures , 2009, IEEE Transactions on Dependable and Secure Computing.

[19]  Mehdi Baradaran Tahoori,et al.  Balancing Performance and Reliability in the Memory Hierarchy , 2005, IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005..

[20]  Fernanda Lima Kastensmidt,et al.  Heavy Ions Induced Single Event Upsets Testing of the 28 nm Xilinx Zynq-7000 All Programmable SoC , 2015, 2015 IEEE Radiation Effects Data Workshop (REDW).

[21]  Robert E. Lyons,et al.  The Use of Triple-Modular Redundancy to Improve Computer Reliability , 1962, IBM J. Res. Dev..

[22]  Muhammad Shafique,et al.  Reliable software for unreliable hardware: Embedded code generation aiming at reliability , 2011, 2011 Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[23]  Alan D. George,et al.  SCIPS: An emulation methodology for fault injection in processor caches , 2011, 2011 Aerospace Conference.

[24]  Lorenzo Alvisi,et al.  Modeling the effect of technology trends on the soft error rate of combinational logic , 2002, Proceedings International Conference on Dependable Systems and Networks.

[25]  Régis Leveugle,et al.  Statistical fault injection: Quantified error and confidence , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[26]  Ronald G. Dreslinski,et al.  Sources of error in full-system simulation , 2014, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[27]  Luigi Carro,et al.  Exploiting cache conflicts to reduce radiation sensitivity of operating systems on embedded systems , 2015, 2015 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES).

[28]  James MacKinnon,et al.  DrSEUs: A dynamic robust single-event upset simulator , 2016, 2016 IEEE Aerospace Conference.

[29]  Raphael R. Some,et al.  A software-implemented fault injection methodology for design and validation of system fault tolerance , 2001, 2001 International Conference on Dependable Systems and Networks.

[30]  Olaf Spinczyk,et al.  Avoiding Pitfalls in Fault-Injection Based Comparison of Program Susceptibility to Soft Errors , 2015, 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[31]  D. M. Hiemstra,et al.  Single event upset characterization of the Pentium(R) MMX and Pentium(R) II microprocessors using proton irradiation , 1999 .

[32]  Steven M. Guertin,et al.  Using Benchmarks for Radiation Testing of Microprocessors and FPGAs , 2015, IEEE Transactions on Nuclear Science.

[33]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[34]  David S. Lee,et al.  Addressing Angular Single-Event Effects in the Estimation of On-Orbit Error Rates , 2015, IEEE Transactions on Nuclear Science.

[35]  Karthikeyan Sankaralingam,et al.  Your favorite simulator here " Considered Harmful , 2014 .

[36]  Mario García-Valderas,et al.  Fault Injection in Modern Microprocessors Using On-Chip Debugging Infrastructures , 2011, IEEE Transactions on Dependable and Secure Computing.

[37]  K. Cahoy,et al.  Causal relationships between solar proton events and single event upsets for communication satellites , 2013, 2013 IEEE Aerospace Conference.

[38]  Dimitris Gizopoulos,et al.  Differential Fault Injection on Microarchitectural Simulators , 2015, 2015 IEEE International Symposium on Workload Characterization.

[39]  Raoul Velazco,et al.  A Survey on Fault Injection Techniques , 2004, Int. Arab J. Inf. Technol..

[40]  Guanpeng Li,et al.  Understanding Error Propagation in Deep Learning Neural Network (DNN) Accelerators and Applications , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.

[41]  E. Ibe,et al.  Impact of Scaling on Neutron-Induced Soft Error in SRAMs From a 250 nm to a 22 nm Design Rule , 2010, IEEE Transactions on Electron Devices.

[42]  Jacob A. Abraham,et al.  Quantitative evaluation of soft error injection techniques for robust system design , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[43]  Karthik Pattabiraman,et al.  LLFI : An Intermediate Code Level Fault Injector For Soft Computing Applications , 2013 .

[44]  Maurizio Rebaudengo,et al.  Evaluating the fault tolerance capabilities of embedded systems via BDM , 1999, Proceedings 17th IEEE VLSI Test Symposium (Cat. No.PR00146).

[45]  Jordi Puig-Suari,et al.  CubeSat: A New Generation of Picosatellite for Education and Industry Low-Cost Space Experimentation , 2000 .

[46]  F. Irom Guideline for ground radiation testing of microprocessors in the space radiation environment , 2008 .