Fault-tolerant routing in MIN-based supercomputers

In this paper we study methods for routing data in supercomputers that use multistage interconnection networks (MINs), in the presence of faulty components in the network. These methods are applicable to existing multiprocessors like IBM GF11 and RP3. These methods are based on the concept of dynamic full-access(DFA) which refers to the ability of the network to route data from any processor in the system to any other processor in a finite number of passes through the network. We introduce a graph-model called DFA graph of a MIN and show how it can be used to determine the DFA capability of the MIN under a given set of network faults. When the faults in the network satisfy certain special properties, we present algorithms for routingany arbitrary permutation in a faulty Bene@@@@ network, and any Omega permutation in a faulty Omega network. These algorithms are simple and operate in a distributed fashion. These techniques allow a supercomputer to efficiently realize permutations of data needed in a parallel computing environment despite the presence of faults in the network.