Static typing for a faulty lambda calculus

A transient hardware fault occurs when an energetic particle strikes a transistor, causing it to change state. These faults do not cause permanent damage, but may result in incorrect program execution by altering signal transfers or stored values. While the likelihood that such transient faults will cause any significant damage may seem remote, over the last several years transient faults have caused costly failures in high-end machines at America Online, eBay, and the Los Alamos Neutron Science Center, among others [6, 44, 15]. Because susceptibility to transient faults is proportional to the size and density of transistors, the problem of transient faults will become increasingly important in the coming decades.This paper defines the first formal, type-theoretic framework for studying reliable computation in the presence of transient faults. More specifically, it defines λzap, a lambda calculus that exhibits intermittent data faults. In order to detect and recover from these faults, λzap programs replicate intermediate computations and use majority voting, thereby modeling software-based fault tolerance techniques studied extensively, but informally [10, 20, 30, 31, 32, 33, 41].To ensure that programs maintain the proper invariants and use λzap primitives correctly, the paper defines a type system for the language. This type system guarantees that well-typed programs can tolerate any single data fault. To demonstrate that λzap can serve as an idealized typed intermediate language, we define a type-preserving translation from a standard simply-typed lambda calculus into λzap.

[1]  Irith Pomeranz,et al.  Transient-fault recovery using simultaneous multithreading , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.

[2]  Edward J. McCluskey,et al.  Software-implemented EDAC protection against SEUs , 2000, IEEE Trans. Reliab..

[3]  Matthias Felleisen,et al.  Abstract models of memory management , 1995, FPCA '95.

[4]  R. Baumann Soft errors in advanced semiconductor devices-part I: the three radiation sources , 2001 .

[5]  Marco Torchiano,et al.  A source-to-source compiler for generating dependable software , 2001, Proceedings First IEEE International Workshop on Source Code Analysis and Manipulation.

[6]  James L. Walsh,et al.  Field testing for cosmic ray soft errors in semiconductor memories , 1996, IBM J. Res. Dev..

[7]  David Walker,et al.  Stack-based typed assembly language , 1998, Journal of Functional Programming.

[8]  Todd M. Austin,et al.  DIVA: a reliable substrate for deep submicron microarchitecture design , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[9]  Eric Rotenberg,et al.  AR-SMT: a microarchitectural approach to fault tolerance in microprocessors , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).

[10]  Timothy J. Slegel,et al.  IBM's S/390 G5 microprocessor design , 1999, IEEE Micro.

[11]  Andrew W. Appel,et al.  A type-based compiler for standard ML , 1995, PLDI '95.

[12]  David I. August,et al.  SWIFT: software implemented fault tolerance , 2005, International Symposium on Code Generation and Optimization.

[13]  Ying C. Yeh Design considerations in Boeing 777 fly-by-wire computers , 1998, Proceedings Third IEEE International High-Assurance Systems Engineering Symposium (Cat. No.98EX231).

[14]  M. Rimen,et al.  Implicit signature checking , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[15]  Edward J. McCluskey,et al.  Control-flow checking by software signatures , 2002, IEEE Trans. Reliab..

[16]  Martín Abadi,et al.  A Theory of Secure Control Flow , 2005, ICFEM.

[17]  Martín Abadi,et al.  Control-flow integrity , 2005, CCS '05.

[18]  Shubhendu S. Mukherjee,et al.  Transient fault detection via simultaneous multithreading , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[19]  ScienceYale UniversityNew Haven An Overview of the Flint/ml Compiler , 1997 .

[20]  Edward J. McCluskey,et al.  Error detection by duplicated instructions in super-scalar processors , 2002, IEEE Trans. Reliab..

[21]  George C. Necula,et al.  Safe kernel extensions without run-time checking , 1996, OSDI '96.

[22]  Cristiana Bolchini A software methodology for detecting hardware faults in VLIW data paths , 2003, IEEE Trans. Reliab..

[23]  Robert W. Horst,et al.  Multiple instruction issue in the NonStop cyclone processor , 1990, ISCA '90.

[24]  N. Hengartner,et al.  Predicting the number of fatal soft errors in Los Alamos national laboratory's ASC Q supercomputer , 2005, IEEE Transactions on Device and Materials Reliability.

[25]  Irith Pomeranz,et al.  Transient-fault recovery for chip multiprocessors , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[26]  Franklyn Turbak,et al.  Strongly Typed Flow-Directed Representation Transformations. , 1997, ICFP 1997.

[27]  Edward J. McCluskey,et al.  Concurrent Error Detection Using Watchdog Processors - A Survey , 1988, IEEE Trans. Computers.

[28]  Joe B. Wells,et al.  Strongly typed flow-directed representation transformations (extended abstract) , 1997, ICFP '97.

[29]  Peter Lee,et al.  TIL: a type-directed, optimizing compiler for ML , 2004, SIGP.

[30]  David I. August,et al.  Automatic Instruction-Level Software-Only Recovery , 2006, IEEE Micro.

[31]  Edward J. McCluskey,et al.  Low Energy Error Detection Technique Using Procedure Call Duplication , 2001 .

[32]  Lorenzo Alvisi,et al.  Modeling the effect of technology trends on the soft error rate of combinational logic , 2002, Proceedings International Conference on Dependable Systems and Networks.

[33]  Edward J. McCluskey,et al.  Dependable adaptive computing systems-the ROAR project , 1998, SMC'98 Conference Proceedings. 1998 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.98CH36218).

[34]  Shubhendu S. Mukherjee,et al.  Detailed design and evaluation of redundant multithreading alternatives , 2002, ISCA.

[35]  John P. Hayes,et al.  Low-cost on-line fault detection using control flow assertions , 2003, 9th IEEE On-Line Testing Symposium, 2003. IOLTS 2003..

[36]  Andrew W. Appel,et al.  Using memory errors to attack a virtual machine , 2003, 2003 Symposium on Security and Privacy, 2003..

[37]  Andrew W. Appel,et al.  Foundational proof-carrying code , 2001, Proceedings 16th Annual IEEE Symposium on Logic in Computer Science.

[38]  Y. C. Yeh,et al.  Triple-triple redundant 777 primary flight computer , 1996, 1996 IEEE Aerospace Applications Conference. Proceedings.

[39]  Edward J. McCluskey,et al.  ED4I: Error Detection by Diverse Data and Duplicated Instructions , 2002, IEEE Trans. Computers.

[40]  Prithviraj Banerjee,et al.  Low Cost Concurrent Error Detection in a VLIW Architecture Using Replicated Instructions , 1992, ICPP.