Design and analysis of software reconfiguration strategies for hypercube multicomputers under multiple faults

The authors discuss the design of a software reconfiguration strategy for hypercube multicomputer architectures under multiple faults. The advantage of the strategy over previous schemes is that it requires no redundant hardware, but supports reconfiguration through graceful degradation. It is based on the notion of using multiple virtual processors on a single physical processor and using these virtual processors for work-load redistribution under faults. The authors describe an environment, developed on a commercially available Intel iPSC/2 hypercube multicomputer, for implementing the software-based fault tolerance scheme. Results of experiments performed with this environment on the performance degradation of application programs under hardware faults are described. The reconfiguration scheme shows low overhead at low cost, and even provides improved efficiency on a fault-free hypercube.<<ETX>>

[1]  S. F. Nugent,et al.  The iPSC/2 direct-connect communications technology , 1988, C3P.

[2]  Ming-Syan Chen,et al.  Depth-First Search Approach for Fault-Tolerant Routing in Hypercube , 1990, IEEE Trans. Parallel Distributed Syst..

[3]  J. Flower,et al.  Moose: a multi-tasking operating system of hypercubes , 1988, C3P.

[4]  Ten-Hwang Lai,et al.  Virtual Subcubes and Job Migration in a Hypercube , 1989, ICPP.

[5]  Bernd Becker,et al.  How Robust Is the n-Cube? (Extended Abstract) , 1986, FOCS.

[6]  K.G. Shin,et al.  Deadlock-Free Fault-Tolerant Routing in Injured Hypercubes , 1993, IEEE Trans. Computers.

[7]  Frank Harary,et al.  Subcube Fault-Tolerance in Hypercubes , 1993, Inf. Comput..

[8]  K. G. Shin,et al.  Hypercube management in the presence of node failures , 1988, C3P.

[9]  J. P. Hayes,et al.  Routing and broadcasting in faulty hypercube computers , 1988, C3P.

[10]  Dirk Grunwald,et al.  Hyperswitch network for the hypercube computer , 1988, ISCA '88.

[11]  Prithviraj Banerjee,et al.  Distributed algorithms for shortest-path, deadlock-free routing and broadcasting in arbitrarily faulty hypercubes , 1990, [1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium.

[12]  J. Koller A dynamic load balancer on the Intel hypercube , 1988, C3P.

[13]  Charles L. Seitz,et al.  The cosmic cube , 1985, CACM.

[14]  Frank Thomson Leighton,et al.  Reconfiguring a hypercube in the presence of faults , 1987, STOC.

[15]  Chung-Chi Jim Li,et al.  Graceful Degradation on Hypercube Multiprocessors Using Data Redistribution , 1990, Proceedings of the Fifth Distributed Memory Computing Conference, 1990..

[16]  Ming-Syan Chen,et al.  Processor Allocation in an N-Cube Multiprocessor Using Gray Codes , 1987, IEEE Transactions on Computers.

[17]  J. P. Hayes,et al.  On allocating subcubes in a hypercube multiprocessor , 1988, C3P.