Virtual channels for fault-tolerant programmable two-dimensional processor arrays

A programmable two-dimensional (2D) processor array is fault-tolerant if faulty processors can be detected, and then avoided during program execution. In the literature there are many schemes on detecting faulty processors and reconfiguring data routing to avoid them. However, an efficient implementation of these schemes on a 2D array can be an extremely difficult programming task if application, fault detection and reconfiguration must all be considered at the same time. The virtual channels mechanism of this paper allows these concerns to be dealt with separately and efficiently. An application or fault detection program may assume that every logical connection between processors is implemented by a dedicated physical connection. A physical connection is composed of a sequence of virtual channels. Since the number of virtual channels between any two processors is not bounded by the number of available physical channels, all dedicated physical connections required by the program can be implemented. The mapping of logical connections to physical connections and the scheduling of a physical channel to implement multiple virtual channels are totally transparent to a program, and can be optimized independently. Various fault tolerance schemes are now readily implementable without programming difficulty. For example, it is straightforward to have concurrent execution of application and fault detection programs on the same 2D array. A switch architecture is presented for implementing the virtual channels mechanism. This architecture is planned to be used in building a fault-tolerant 2D Warp array.