This paper examines routing and broadcasting algorithms for hypercube computers subject to node failures. First some simple message-passing algorithms are described which perform well with certain fault patterns, but poorly with others. The concept of an unsafe node is introduced to identify fault-free nodes that may cause communication difficulties in faulty hypercubes. It is then shown that by only using “feasible” paths that try to avoid unsafe nodes, routing and broadcasting can be substantially simplified. It is assumed that each active node is supplied with the fault status of all neighboring nodes within a specified radius k. A computationally efficient routing algorithm is presented which can route a message via a path of length no greater than p+2, where p is the minimum feasible distance from the source to the destination, provided that not all non-faulty nodes in the hypercube are unsafe, and k = 1. We further show that broadcasting can be achieved under the same fault conditions with only one more time unit than the fault-free case.
[1]
Sudhakar M. Reddy,et al.
Distributed fault-tolerance for large multiprocessor systems
,
1980,
ISCA '80.
[2]
Leslie G. Valiant,et al.
A Scheme for Fast Parallel Communication
,
1982,
SIAM J. Comput..
[3]
Howard P. Katseff,et al.
Incomplete Hypercubes
,
1988,
IEEE Trans. Computers.
[4]
Theodore R. Bashkow,et al.
A large scale, homogeneous, fully distributed parallel machine, I
,
1977,
ISCA '77.
[5]
Yousef Saad,et al.
Data Communication in Hypercubes
,
1989,
J. Parallel Distributed Comput..
[6]
Marshall C. Pease,et al.
The Indirect Binary n-Cube Microprocessor Array
,
1977,
IEEE Transactions on Computers.
[7]
Charles L. Seitz,et al.
The cosmic cube
,
1985,
CACM.
[8]
S. Lennart Johnsson,et al.
Distributed Routing Algorithms for Broadcasting and Personalized Communication in Hypercubes
,
1986,
ICPP.