A Reconfigurable Modular Fault-Tolerant Hypercube Architecture

We propose a new fault-tolerant design of a hypercube system. We first build the fault-tolerant modules (FTM's), then we interconnect these FTM's as the modular hypercube. Finally, we obtain our proposed system by augmenting links, called the spare-sharing links (SSL's), in the modular hypercube, which forms a ring connection in our architecture. The characteristic of our system is that the spare nodes in an FTM can be used as local spares to replace the faulty nodes in the FTM, or as remote spares to replace the faulty nodes in other FTM's via the spare-sharing links in the architecture. Thus, the use of spare nodes in any FTM will increase, and the proposed system reliability will improve. In the system, the switch and link failures are also considered. The modular diagnosis and modular reconfiguration are proposed to identify and reconfigure the failure of nodes, switches, and links. >

[1]  John P. Hayes,et al.  An automorphic approach to the design of fault-tolerant multiprocessors , 1989, [1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[2]  Fabrizio Grandoni,et al.  A Theory of Diagnosability of Digital Systems , 1976, IEEE Transactions on Computers.

[3]  Arthur L. Liestman,et al.  A proposal for a fault-tolerant binary hypercube architecture , 1989, [1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[4]  Rami G. Melhem,et al.  An Efficient Modular Spare Allocation Scheme and Its Application to Fault Tolerant Binary Hypercubes , 1991, IEEE Trans. Parallel Distributed Syst..

[5]  Arthur L. Liestman,et al.  A Fault-Tolerant Binary Tree Architecture , 1991, ICCI.

[6]  Richard W. Hamming,et al.  Coding and Information Theory , 2018, Feynman Lectures on Computation.

[7]  Frank Harary,et al.  Graph Theory , 2016 .

[8]  Yousef Saad,et al.  Multigrid Algorithms on the Hypercube Multiprocessor , 1986, IEEE Transactions on Computers.