Formal Methods for Modelling and Analysis of Single-Event Upsets

When a high-energy particle such as a proton strikes a CPU, the impact may result in the corruption of a data register on the CPU. Such a single-event upset (SEU), in which a random bit is flipped in the content of a data register, can lead to critical errors in the execution of a program. This is particularly problematic for security-or safety-critical systems where such errors may have grave consequences. In this paper we develop a formal semantic framework for easy formal modelling of a large variety of SEUs in a core assembly language capturing the essential features of the ARM assembly language. We use this framework to formally prove the soundness of a static analysis enforcing so-called blue/green separation in a given program. Blue/green separation is a replication based technique for making a program fault-tolerant with respect to data-flow SEUs, however, full coverage requires special hardware support. We further use our semantic framework for deriving program fragments, so-called gadgets, for partial blue/green separation without special hardware. Finally, we illustrate how to apply statistical model checking in our framework to model and quantify faults that go well beyond data-flow SEUs and can provide statistics on the level of fault-tolerance of a program. We use this to provide evidence that our suggested program modifications significantly decrease the probability of such errors going undetected.

[1]  Farokh Irom,et al.  Single-event upset in the PowerPC750 microprocessor , 2001 .

[2]  Ravishankar K. Iyer,et al.  SymPLFIED: Symbolic program-level fault injection and error detection framework , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).

[3]  R. Ecoffet,et al.  Observations Of Single-event Upset And Multiple-bit Upset In Non-hardened High-density SRAMs In The TOPEX/ Poseidon Orbit , 1993, 1993 IEEE Radiation Effects Data Workshop.

[4]  Kim G. Larsen,et al.  Time for Statistical Model Checking of Real-Time Systems , 2011, CAV.

[5]  E. Normand Single event upset at ground level , 1996 .

[6]  David Walker,et al.  Fault-tolerant typed assembly language , 2007, PLDI '07.

[7]  Alon Y. Halevy,et al.  Static analysis in datalog extensions , 2001, JACM.

[8]  Edward J. McCluskey,et al.  Error detection by duplicated instructions in super-scalar processors , 2002, IEEE Trans. Reliab..

[9]  Shekhar Y. Borkar,et al.  Designing reliable systems from unreliable components: the challenges of transistor variability and degradation , 2005, IEEE Micro.

[10]  Francesco Zappa Nardelli,et al.  The semantics of power and ARM multiprocessor machine code , 2009, DAMP '09.

[11]  Flemming Nielson,et al.  Principles of Program Analysis , 1999, Springer Berlin Heidelberg.

[12]  Edward J. McCluskey,et al.  Control-flow checking by software signatures , 2002, IEEE Trans. Reliab..

[13]  Wang Yi,et al.  Uppaal in a nutshell , 1997, International Journal on Software Tools for Technology Transfer.

[14]  Bogdan Nicolescu,et al.  Detecting Soft Errors by a Purely Software Approach: Method, Tools and Experimental Results , 2003, DATE.

[15]  David Walker,et al.  Reasoning about Control Flow in the Presence of Transient Faults , 2008, SAS.

[16]  Vishwani D. Agrawal,et al.  Single Event Upset: An Embedded Tutorial , 2008, 21st International Conference on VLSI Design (VLSID 2008).

[17]  David I. August,et al.  SWIFT: software implemented fault tolerance , 2005, International Symposium on Code Generation and Optimization.

[18]  David Walker,et al.  Faulty Logic: Reasoning about Fault Tolerant Programs , 2010, ESOP.

[19]  Sanjay J. Patel,et al.  Characterizing the effects of transient faults on a high-performance processor pipeline , 2004, International Conference on Dependable Systems and Networks, 2004.

[20]  David Seal,et al.  ARM Architecture Reference Manual , 2001 .