A Fault-Tolerant Communication Scheme for Hypercube Computers

A fault-tolerant communication scheme that facilitates near-optimal routing and broadcasting in hypercube computers subject to node failures is described. The concept of an unsafe node is introduced to identify fault-free nodes that may cause communication difficulties. It is shown that by only using 'feasible' paths that try to avoid unsafe nodes, routing and broadcasting can be substantially simplified. A computationally efficient routing algorithm that uses local information is presented. It can route a message via a path of length no greater than p+2, where p is the minimum distance from the source to the destination, provided that not all nonfaulty nodes in the hypercube are unsafe. Broadcasting can be achieved under the same fault conditions with only one more time unit than the fault-free case. The problems posed by deadlock in faulty hypercubes are discussed, and deadlock-free implementations of the proposed communication schemes are presented. >

[1]  Leslie G. Valiant,et al.  A Scheme for Fast Parallel Communication , 1982, SIAM J. Comput..

[2]  D. A. Reed,et al.  Networks for parallel processors: measurements and prognostications , 1988, C3P.

[3]  Theodore R. Bashkow,et al.  A large scale, homogeneous, fully distributed parallel machine, I , 1977, ISCA '77.

[4]  K. Gunther,et al.  Prevention of Deadlocks in Packet-Switched Data Transport Systems , 1981 .

[5]  Howard P. Katseff,et al.  Incomplete Hypercubes , 1988, IEEE Trans. Computers.

[6]  Parameswaran Ramanathan,et al.  Reliable Broadcast in Hypercube Multicomputers , 1988, IEEE Trans. Computers.

[7]  K. G. Shin,et al.  Message routing in an injured hypercube , 1988, C3P.

[8]  J. P. Hayes,et al.  Routing and broadcasting in faulty hypercube computers , 1988, C3P.

[9]  William J. Dally,et al.  Deadlock-Free Message Routing in Multiprocessor Interconnection Networks , 1987, IEEE Transactions on Computers.

[10]  P. Merlin,et al.  Deadlock Avoidance in Store-and-Forward Networks - I: Store-and-Forward Deadlock , 1980, IEEE Transactions on Communications.

[11]  S. Lennart Johnsson,et al.  Distributed Routing Algorithms for Broadcasting and Personalized Communication in Hypercubes , 1986, ICPP.

[12]  S. F. Nugent,et al.  The iPSC/2 direct-connect communications technology , 1988, C3P.

[13]  Leonard Kleinrock,et al.  Virtual Cut-Through: A New Computer Communication Switching Technique , 1979, Comput. Networks.

[14]  Ming-Syan Chen,et al.  Distributed routing and task allocation in multicomputer systems , 1988 .

[15]  Jeffrey D. Ullman,et al.  Deadlock-free packet switching networks , 1979, SIAM J. Comput..

[16]  Quentin F. Stout,et al.  Hypercube message routing in the presence of faults , 1988, C3P.

[17]  John P. Hayes,et al.  A Microprocessor-based Hypercube Supercomputer , 1986, IEEE Micro.