A programmable two-dimensional (2D) processor array is fault-tolerant if faulty processors can be detected, and then avoided during program execution. With VLSI the number of processors that can be implemented in a 2D array increases, and as a result more and more cells can be devoted for fault tolerance purposes. In the literature there are many schemes on detecting faulty processors and reconfiguring data routing to avoid them. However, an efficient implementation of these schemes on a 2D array can be an extremely difficult task if application, fault detection and reconfiguration must all be considered at the same time. The virtual channels mechanism of this paper allows these concerns to be dealt with separately and efficiently. An application or fault detection program may assume that every logical connection between processors is implemented by a dedicated physical connection. A physical connection is composed of a sequence of virtual channels. Since the number of virtual channels between any two processors is not bounded by the number of available physical channels, all dedicated physical connections required by the program can be implemented. The mapping of logical connections to physical connections and the scheduling of a physical channel to implement multiple virtual channels are totally transparent to a program, and can be optimized independently. Various fault tolerance schemes are now readily implementable without programming difficulty. For example, it is straightforward to have concurrent execution of application and fault detection programs on the same 2D array. A switch architecture, suitable for VLSI implementation, is presented for implementing the virtual channels mechanism. The switch captures all the architectural features needed to implement the mechanism and can be used with different processors. By using multiple copies of this switch, a variety of fault-tolerant 2D arrays can be formed. In particular the switch architecture is planned to be used in building a fault-tolerant 2D Warp array.
[1]
Robert S. Swarz,et al.
The theory and practice of reliable system design
,
1982
.
[2]
Mariagiovanna Sami,et al.
Reconfigurable architectures for VLSI processing arrays
,
1983,
Proceedings of the IEEE.
[3]
Israel Koren.
A reconfigurable and fault-tolerant VLSI multiprocessor array
,
1981,
ISCA '81.
[4]
H. T. Kung,et al.
Warp architecture: From prototype to production
,
1899
.
[5]
H. T. Kung,et al.
Wafer-scale integration and two-level pipelined implementations of systolic arrays
,
1984,
J. Parallel Distributed Comput..
[6]
GERNOT METZE,et al.
On the Connection Assignment Problem of Diagnosable Systems
,
1967,
IEEE Trans. Electron. Comput..
[7]
Donald S. Fussell,et al.
Fault-tolerant wafer-scale architectures for VLSI
,
1982,
ISCA 1982.