Routing in modular fault tolerant multiprocessor systems

The authors consider a class of modular multiprocessor architectures in which spares are added to each module to cover for faulty nodes within that module, thus forming a fault tolerant basic block (FTBB). The goal is to preserve the logical adjacency between active nodes by means of a routing algorithm which delivers messages successfully to their destinations. Two phase routing strategies are introduced that route messages first to their destination FTBB, and then to the destination nodes within the destination FTBB. This strategy may be applied to a variety of architectures including binary hypercubes and 3-D tori. In the presence of f faults in these systems. It is shown that the worst case length of the message route is max( sigma +f, (K+1) sigma )+M, where sigma is the shortest path in the absence of faults, and M and K are the numbers of primary nodes and spare nodes in a FTBB, respectively. The average routing overhead is much lower than the worst case overhead.<<ETX>>

[1]  Mohammad Sultan Alam Fault tolerance in modular multiprocessor systems , 1992 .

[2]  Rami G. Melhem,et al.  Channel Multiplexing in Modular Fault Tolerant Multiprocessors , 1991, ICPP.

[3]  Rami G. Melhem,et al.  An Efficient Modular Spare Allocation Scheme and Its Application to Fault Tolerant Binary Hypercubes , 1991, IEEE Trans. Parallel Distributed Syst..

[4]  David Peleg,et al.  On fault tolerant routings in general networks , 1986, PODC '86.

[5]  Adit D. Singh Interstitial Redundancy: An Area Efficient Fault Tolerance Scheme for Large Area VLSI Processor Arrays , 1988, IEEE Trans. Computers.

[6]  John P. Hayes,et al.  Distributed Recovery in Fault-Tolerant Multiprocessor Networks , 1986, IEEE Transactions on Computers.

[7]  Ming-Syan Chen,et al.  Depth-First Search Approach for Fault-Tolerant Routing in Hypercube , 1990, IEEE Trans. Parallel Distributed Syst..

[8]  Stephen Y. H. Su,et al.  Reconfiguration of VLSI/WSI Mesh Array Processors with Two-Level Redundancy , 1989, IEEE Trans. Computers.

[9]  Kang G. Shin,et al.  Message routing in HARTS with faulty components , 1989, [1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[10]  Sun-Yuan Kung,et al.  Fault-Tolerant Array Processors Using Single-Track Switches , 1989, IEEE Trans. Computers.

[11]  Quentin F. Stout,et al.  Hypercube message routing in the presence of faults , 1988, C3P.

[12]  John P. Hayes,et al.  An automorphic approach to the design of fault-tolerant multiprocessors , 1989, [1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[13]  Frank Thomson Leighton,et al.  Reconfiguring a hypercube in the presence of faults , 1987, STOC.

[14]  W. Kent Fuchs,et al.  Reconfigurable Cube-Connected Cycles Architectures , 1990, J. Parallel Distributed Comput..