A Fault-Tolerant Dataflow System

The dataflow model of computation allows functions to be run concurrently on multiple processors, reducing execution time significantly. This advantage and the partial results of the computation will be lost if processors fail. Therefore, a crucial feature in a concurrent system is the ability to continue the computation when components fail, a feature known as fault tolerance. Although several dataflow architectures have been proposed, few are fault tolerant and able to balance the load on the system dynamically. But a distributed computer system (DCS) based on a task-level dataflow architecture can reduce traffic, speed communication between processors, and tolerate hardware faults by automatically reassigning computations to a healthy processor. Such a DCS has the potential to provide better performance than conventional multiprocessors ' because the execution of a function is free of side effects. By asking when and how to do node reassignment as the dataflow architecture and processor are designed, designers can incorporate the necessary support mechanisms.