A Programming Model and Language Implementation for Concurrent Failure-Prone Hardware

We present a programming model and its embodiment in a language implementation, for systems composed of large numbers of failure-prone, resource-constrained elements, interconnected in error-prone networks. The programming model enables partitioning without replication, of applications, across multiple devices with constrained memory resources. It permits programs to specify the amount of error (value deviation) tolerable in individual variables, as well as tolerable latencies and erasures on communications. The value deviation constraints facilitate compile-time transformations for forward error correction; these transformations enable the value deviations in individual variables to be kept within program-specified bounds, in the presence of an assumed distribution of logic upsets in hardware. To account for situations in which such assumptions may be violated, language constructs enable the change of control-flow in response to tolerance constraint violations. The language model and implementation are targeted primarily at concurrent failure-prone embedded systems, such as sensor networks.