Efficient fault-tolerant mesh and hypercube architectures

The authors present an efficient method for tolerating faults in d-dimensional mesh and hypercube architectures. The approach consists of adding spare processors and communication links so that the resulting architecture can be reconfigured to form the desired mesh or hypercube in the presence of faults. The cost of the fault-tolerant architecture is optimized by adding exactly k spare processors and minimizing the number of links per processor. The results are surprisingly efficient. For example, when the desired architecture is a d-dimensional mesh and k=1, the fault-tolerant architecture has the same maximum degree as the desired architecture and has only one spare processor. Efficient layouts are presented for fault-tolerant two- and three-dimensional meshes, and it is shown that multiplexers and buses can be used to reduce the degree of the fault-tolerant architectures.<<ETX>>

[1]  W. Kent Fuchs,et al.  Efficient Spare Allocation for Reconfigurable Arrays , 1987 .

[2]  Prithviraj Banerjee,et al.  A Fault Tolerant Massively Parallel Processing Architecture , 1987, J. Parallel Distributed Comput..

[3]  John P. Hayes,et al.  On Designing and Reconfiguring k-Fault-Tolerant Tree Architectures , 1990, IEEE Trans. Computers.

[4]  Alain J. Martin The Torus: An Exercise in Constructing a Processing Surface , 1982 .

[5]  Fred S. Annexstein Fault tolerance in hypercube-derivative networks , 1989, SPAA '89.

[6]  B. Elspas,et al.  Graphs with circulant adjacency matrices , 1970 .

[7]  Chak-Kuen Wong,et al.  Minimum k-hamiltonian graphs, II , 1986, J. Graph Theory.

[8]  Frank Thomson Leighton,et al.  Wafer-Scale Integration of Systolic Arrays , 1985, IEEE Trans. Computers.

[9]  John P. Hayes,et al.  Designing Fault-Tolerant System Using Automorphisms , 1991, J. Parallel Distributed Comput..

[10]  Jehoshua Bruck,et al.  Running algorithms efficiently on faulty hypercubes , 1990, SPAA '90.

[11]  John P. Hayes,et al.  A Graph Model for Fault-Tolerant Computing Systems , 1976, IEEE Transactions on Computers.

[12]  Arnold L. Rosenberg,et al.  The Diogenes Approach to Testable Fault-Tolerant Arrays of Processors , 1983, IEEE Transactions on Computers.

[13]  Jehoshua Bruck,et al.  Efficient Algorithms for Reconfiguration in VLSI/WSI Arrays , 1990, IEEE Trans. Computers.

[14]  Anna R. Karlin,et al.  Asymptotically tight bounds for computing with faulty arrays of processors , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[15]  Frank Thomson Leighton,et al.  Fast computation using faulty hypercubes , 1989, STOC '89.

[16]  John P. Hayes,et al.  Some practical issues in the design of fault-tolerant multiprocessors , 1991, [1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium.

[17]  Chak-Kuen Wong,et al.  Minimum K-hamiltonian graphs , 1984, J. Graph Theory.

[18]  Kenneth E. Batcher,et al.  Design of a Massively Parallel Processor , 1980, IEEE Transactions on Computers.

[19]  Masahiko Oka,et al.  A Defect-Tolerant Design for Full-Wafer Memory LSI , 1983, ESSCIRC '83: Ninth European Solid-State Circuits Conference.