Hypercube management in the presence of node failures

The problem of allocation and release of subcubes from a hypercube with node failures is addressed. Two algorithms are presented, both based on the Buddy allocation scheme for memory management which is also used by the AXIS operating system of the NCUBE hypercube computer. The first algorithm is a simple variation of the Buddy algorithm which permits the efficient allocation of subcubes in the presence of a single faulty node. The second algorithm, which effectively subsumes the first, tries to reduce the fragmentation caused by multiple failed nodes. It uses a relabeling scheme to group the failed nodes so that large non-faulty subcubes can be detected using the Buddy allocation scheme. The relative performance of these algorithms is studied using simulation and the proposed algorithms are shown to have a consistently better performance. Issues relating to the detection of faulty nodes on the NCUBE computer and the consequences of the relabeling on message passing are also discussed.