Reconfiguration of Rings and Meshes in Faulty Hypercubes

In this paper we present schemes for reconfiguration of embedded task graphs in hypercubes. Previous results, which use either fault-tolerant embedding or an automorphism approach, can be expensive in terms of either the required number of spare nodes or reconfiguration time. Using the free dimension concept, we combine the above two approaches in our schemes which can tolerate about n faulty nodes under the worst case while keeping task migration time small. With expansion-2 initial embedding, three distributed reconfiguration schemes are presented in this paper. The first scheme, applied to chains and rings, can tolerate any ? ? n ? 2 faulty nodes in an n-dimensional hypercube. The second and third schemes are applied to meshes or tori. For a mesh or torus of size 2m1 ? ··· ? 2md, the second scheme can tolerate any ? ? mi ? 1 faulty nodes, where mi is the largest direction of the mesh and n = m1 + ··· + md + 1. By embedding two copies of meshes or tori in cube, the third scheme can tolerate any ? ? n ? 1 faulty nodes with the dilation of embedding after reconfiguration degraded to 2. The third scheme is quite general and can be applied to any task graph.

[1]  Rami Melhem,et al.  Distributed Fault Tolerant Embedding of Binary Trees and Rings in Hypercubes , 1989 .

[2]  John P. Hayes,et al.  An automorphic approach to the design of fault-tolerant multiprocessors , 1989, [1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[3]  Wei-Tek Tsai,et al.  An efficient multi-dimensional grids reconfiguration algorithm on hypercube , 1988, [1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[4]  Parameswaran Ramanathan,et al.  Reliable Broadcast in Hypercube Multicomputers , 1988, IEEE Trans. Computers.

[5]  Cauligi S. Raghavendra,et al.  Algorithms and Bounds for Shortest Paths and Diameter in Faulty Hypercubes , 1993, IEEE Trans. Parallel Distributed Syst..

[6]  Frank Thomson Leighton,et al.  Coding theory, hypercube embeddings, and fault tolerance , 1991, SPAA '91.

[7]  Frank Thomson Leighton,et al.  Reconfiguring a hypercube in the presence of faults , 1987, STOC.

[8]  Jacques Malenfant,et al.  Computing Optimal Checkpointing Strategies for Rollback and Recovery Systems , 1988, IEEE Trans. Computers.

[9]  Sartaj Sahni,et al.  Optimal BPC Permutations on a Cube Connected SIMD Computer , 1982, IEEE Transactions on Computers.

[10]  Robert S. Swarz,et al.  The theory and practice of reliable system design , 1982 .

[11]  Cauligi S. Raghavendra,et al.  Free dimensions-an effective approach to achieving fault tolerance in hypercube , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.

[12]  Cauligi S. Raghavendra,et al.  Embedding of Multidimensional Meshes on to Faulty Hypercubes , 1991, ICPP.

[13]  Tze Chiang Lee Quick Recovery of Embedded Structures in Hypercube Computers , 1990, Proceedings of the Fifth Distributed Memory Computing Conference, 1990..

[14]  Kang G. Shin,et al.  Optimal Checkpointing of Real-Time Tasks , 1987, IEEE Transactions on Computers.

[15]  Jehoshua Bruck,et al.  Tolerating Faults in Hypercubes Using Subcube Partitioning , 1992, IEEE Trans. Computers.

[16]  John P. Hayes,et al.  Distributed Recovery in Fault-Tolerant Multiprocessor Networks , 1986, IEEE Transactions on Computers.

[17]  M. H. Schultz,et al.  Topological properties of hypercubes , 1988, IEEE Trans. Computers.

[18]  Mee Yee Chan,et al.  Fault-Tolerant Embedding of Complete Binary Trees in Hypercubes , 1993, IEEE Trans. Parallel Distributed Syst..