Distributed Fault Tolerant Embedding of Binary Trees and Rings in Hypercubes

In this paper we first present fault tolerance techniques based on distributed algorithms for embedding binary trees in hypercubes. Starting with the root (invoked in some cube node by a host), each node is responsible for determining the addresses of its children, and for invoking the embedding algorithm for the subtree rooted at each child in the proper cube node. This distributed embedding, along with the wealth of communication links in the hypercube, leads to a high potential for fault tolerance. We demonstrate the fault tolerance capability by introducing restructuring techniques which may be used to tolerate faults during the initial embedding, as well as to remap nodes that fail at run-time. The distributed nature of the embeddings eliminates the need for global knowledge of faulty nodes; each node must only know the status of its neighbors.