On the fault tolerance of some popular bounded-degree networks

The authors analyze the fault-tolerance properties of several bounded-degree networks that are commonly used for parallel computation. Among other things, they show that an N-node butterfly containing N/sup 1- epsilon / worst-case faults (for any constant epsilon >0) can emulate a fault-free butterfly of the same size with only constant slowdown. Similar results are proved for the shuffle-exchange graph. Hence, these networks become the first connected bounded-degree networks known to be able to sustain more than a constant number of worst-case faults without suffering more than a constant-factor slowdown in performance. They also show that an N-node butterfly whose nodes fail with some constant probability p can emulate a fault-free version of itself with a slowdown of 2/sup O(log* N)/, which is a very slowly increasing function of N. The proofs of these results combine the technique of redundant computation with new algorithms for routing packets around faults in hypercubic networks. Techniques for reconfiguring hypercubic networks around faults that do not rely on redundant computation are also presented. These techniques tolerate fewer faults but are more widely applicable since they can be used with other networks such as binary trees and meshes of trees.<<ETX>>

[1]  Robert Cypher,et al.  Fault-tolerant embeddings of rings, meshes, and tori in hypercubes , 1992, [1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing.

[2]  Abraham Waksman,et al.  A Permutation Network , 1968, JACM.

[3]  Frank Thomson Leighton Introduction to parallel algorithms and architectures: arrays , 1992 .

[4]  J. Spencer Ten lectures on the probabilistic method , 1987 .

[5]  Hisao Tamaki Robust bounded-degree networks with small diameters , 1992, SPAA '92.

[6]  Bruce M. Maggs,et al.  Randomized Routing and Sorting on Fixed-Connection Networks , 1994, J. Algorithms.

[7]  John P. Hayes,et al.  Designing Fault-Tolerant System Using Automorphisms , 1991, J. Parallel Distributed Comput..

[8]  Richard Cole,et al.  Multi-scale self-simulation: a technique for reconfiguring arrays with faults , 1993, STOC '93.

[9]  Bruce M. Maggs,et al.  Fast Algorithms for Routing Around Faults in Multibutterflies and Randomly-Wired Splitter Networks , 1992, IEEE Trans. Computers.

[10]  Abbas El Gamal,et al.  Configuration of VLSI Arrays in the Presence of Defects , 1984, JACM.

[11]  Friedhelm Meyer auf der Heide,et al.  Time-Optimal Simulations of Networks by Universal Parallel Computers , 1989, STACS.

[12]  Hisao Tamaki,et al.  Efficient self-embedding of butterfly networks with random faults , 1992, Proceedings., 33rd Annual Symposium on Foundations of Computer Science.

[13]  Arnold L. Rosenberg,et al.  Tolerating Faults in Synchronization Networks , 1992, CONPAR.

[14]  Arnold L. Rosenberg,et al.  Work-preserving emulations of fixed-connection networks , 1989, STOC '89.

[15]  Charles E. Leiserson,et al.  Randomized Routing on Fat-Trees , 1989, Adv. Comput. Res..

[16]  Bruce M. Maggs,et al.  Simple algorithms for routing on butterfly networks with bounded queues , 1992, STOC '92.

[17]  Ernst W. Mayr,et al.  Embedding complete binary trees in faulty hypercubes , 1991, Proceedings of the Third IEEE Symposium on Parallel and Distributed Processing.

[18]  Frank Thomson Leighton,et al.  Coding theory, hypercube embeddings, and fault tolerance , 1991, SPAA '91.

[19]  Frank Thomson Leighton,et al.  Fast Computation Using Faulty Hypercubes (Extended Abstract) , 1989, Symposium on the Theory of Computing.

[20]  Bruce M. Maggs,et al.  Universal packet routing algorithms , 1988, [Proceedings 1988] 29th Annual Symposium on Foundations of Computer Science.

[21]  Frank Thomson Leighton,et al.  Fast computation using faulty hypercubes , 1989, STOC '89.

[22]  Sivan Toledo,et al.  Competitive fault-tolerance in area-universal networks , 1992, SPAA '92.

[23]  Leslie G. Valiant,et al.  A Scheme for Fast Parallel Communication , 1982, SIAM J. Comput..

[24]  Howard Jay Siegel,et al.  The Extra Stage Cube: A Fault-Tolerant Interconnection Network for Supersystems , 1982, IEEE Transactions on Computers.

[25]  Prabhakar Raghavan,et al.  Probabilistic construction of deterministic algorithms: Approximating packing integer programs , 1986, 27th Annual Symposium on Foundations of Computer Science (sfcs 1986).

[26]  Jehoshua Bruck,et al.  Tolerating Faults in Hypercubes Using Subcube Partitioning , 1992, IEEE Trans. Computers.

[27]  Eric J. Schwabe On the computational equivalence of hypercube-derived networks , 1990, SPAA '90.

[28]  Yonatan Aumann,et al.  Asymptotically optimal PRAM emulation on faulty hypercubes , 1991, [1991] Proceedings 32nd Annual Symposium of Foundations of Computer Science.

[29]  Frank Harary,et al.  Subcube Fault-Tolerance in Hypercubes , 1993, Inf. Comput..

[30]  Friedhelm Meyer auf der Heide,et al.  Efficient Simulations Among Several Models of Parallel Computers , 1984, SIAM journal on computing (Print).

[31]  Noga Alon,et al.  Fault tolerant graphs, perfect hash functions and disjoint paths , 1992, Proceedings., 33rd Annual Symposium on Foundations of Computer Science.

[32]  Fred S. Annexstein Fault tolerance in hypercube-derivative networks , 1989, SPAA '89.

[33]  Frank Thomson Leighton,et al.  Reconfiguring a hypercube in the presence of faults , 1987, STOC.

[34]  Jehoshua Bruck,et al.  Fault-tolerant meshes with minimal numbers of spares , 1991, Proceedings of the Third IEEE Symposium on Parallel and Distributed Processing.

[35]  F. Leighton,et al.  Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes , 1991 .

[36]  D. C. Opferman,et al.  On a class of rearrangeable switching networks part II: Enumeration studies and fault diagnosis , 1971 .

[37]  C. Greg Plaxton,et al.  Highly fault-tolerant sorting circuits , 1991, [1991] Proceedings 32nd Annual Symposium of Foundations of Computer Science.

[38]  Friedhelm Meyer auf der Heide Efficient Simulations Among Several Models of Parallel Computers , 1986, SIAM J. Comput..

[39]  Anna R. Karlin,et al.  Asymptotically tight bounds for computing with faulty arrays of processors , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[40]  Charles E. Leiserson,et al.  Fat-trees: Universal networks for hardware-efficient supercomputing , 1985, IEEE Transactions on Computers.

[41]  Eli Upfal,et al.  Fault tolerant sorting network , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[42]  Friedhelm Meyer auf der Heide,et al.  Efficiency of universal parallel computers , 1983, Acta Informatica.

[43]  Geng Lin Fault tolerant planar communication networks , 1992, STOC '92.

[44]  Yuh-Dauh Lyuu Fast-fault-tolerant parallel communication and on-line maintenance using information dispersal , 1990, SPAA '90.

[45]  Geng Lin,et al.  Fault-tolerant circuit-switching networks , 1992, SPAA '92.

[46]  Jehoshua Bruck,et al.  Running algorithms efficiently on faulty hypercubes , 1990, SPAA '90.

[47]  Jehoshua Bruck,et al.  Fault-tolerant meshes with small degree , 1993, SPAA '93.

[48]  Bruce M. Maggs,et al.  Packet routing and job-shop scheduling inO(congestion+dilation) steps , 1994, Comb..

[49]  Frank Thomson Leighton,et al.  Wafer-scale integration of systolic arrays , 1982, 23rd Annual Symposium on Foundations of Computer Science (sfcs 1982).

[50]  John P. Hayes,et al.  On Designing and Reconfiguring k-Fault-Tolerant Tree Architectures , 1990, IEEE Trans. Computers.

[51]  Eric J. Schwabe,et al.  Efficient embeddings and simulations for hypercubic networks , 1991 .

[52]  Michael O. Rabin,et al.  Efficient dispersal of information for security, load balancing, and fault tolerance , 1989, JACM.

[53]  Sajal K. Das,et al.  Book Review: Introduction to Parallel Algorithms and Architectures : Arrays, Trees, Hypercubes by F. T. Leighton (Morgan Kauffman Pub, 1992) , 1992, SIGA.

[54]  Bruce M. Maggs,et al.  On-line algorithms for path selection in a nonblocking network , 1990, STOC '90.