Using powerful processors in a configurable systolic array architecture

This thesis develops a highly configurable array architecture for powerful processors, and shows that this architecture will be highly effective in supporting fault tolerance. The architecture uses a two-dimensional switch network and implements a mechanism called virtual channels, which is a deadlock-free mechanism for multiplexing physical connections to implement multiple logical connections for inter-processor communication. The architecture provides efficient utilization of redundant processors in an array, and allows various on-line testing schemes to be implemented for concurrent testing of the array during program execution. It also supports a program model which decouples the logical array seen by a program from the physical array it actually runs on. The architecture is evaluated for an array of Warp cells, using real application programs, and through extensive simulation. For small and medium size arrays (up to 16 x 16) with redundant processors, close to best possible survivability can be achieved with relatively little performance degradation. In particular, it is shown that a modest degree of multiplexing in physical connections (4-5) is sufficient to ensure successful mapping of a logical array onto a physical array as long as there are enough functional components. This modest degree of multiplexing leads to little or no performance degradation for most programs. It also implies that a modest switch would be sufficient to construct configurable arrays of various sizes and with various degrees of redundancy. It is shown that all or close to all of the redundant cells can be utilized for different on-line testing schemes while tolerating faulty components in the array. Moreover, it is shown that the impact of on-line testing schemes on the performance of a program would be small (around 5%) for most programs, and an on-line testing scheme like shadowing would provide significant fault coverage (around 98%) for faults which may cause errors in the output of a cell program like FFT. The thesis provides a broad attack on the problem of evaluating a system without building it. (Abstract shortened with permission of author.)