Adaptive box-based efficient fault-tolerant routing in 3D torus

In this paper, we propose efficient fault-tolerant routing algorithms for 3D torus with possible large number of faulty nodes. There is no any presumption on the number and the distribution of faulty nodes. The proposed algorithms find a fault-free path between any two nonfaulty nodes with high probability in linear time by using only the local faulty information of the network. The results of our empirical analysis through simulations show that the algorithms can find a fault-free path between any two nonfaulty nodes with a probability higher than 90% in a 3D torus with the number of faulty nodes up to 30%.

[1]  Jianer Chen,et al.  Locally Subcube-Connected Hypercube Networks: Theoretical Analysis and Experimental Results , 2002, IEEE Trans. Computers.

[2]  Dong Xiang Fault-Tolerant Routing in Hypercube Multicomputers Using Local Safety Information , 2001, IEEE Trans. Parallel Distributed Syst..

[3]  Shietung Peng,et al.  Fault tolerant routing in toroidal networks , 1995, Proceedings the First Aizu International Symposium on Parallel Algorithms/Architecture Synthesis.

[4]  S. Chalasani,et al.  Adaptive wormhole routing in tori with faults , 1995 .

[5]  Shietung Peng,et al.  Unicast in Hypercubes with Large Number of Faulty Nodes , 1999, IEEE Trans. Parallel Distributed Syst..

[6]  Suresh Chalasani,et al.  Fault-tolerant wormhole routing in tori , 1994, ICS '94.

[7]  Suresh Chalasani,et al.  Fault-Tolerant Wormhole Routing Algorithms for Mesh Networks , 1995, IEEE Trans. Computers.

[8]  L. D. Aronson Homogeneous Routing for Homogeneous Traffic Patterns on Meshes , 2000, IEEE Trans. Parallel Distributed Syst..

[9]  Sheng-De Wang,et al.  Adaptive and Deadlock-Free Routing for Irregular Faulty Patterns in Mesh Multicomputers , 2000, IEEE Trans. Parallel Distributed Syst..

[10]  WuJie Fault-Tolerant Adaptive and Minimal Routing in Mesh-Connected Multicomputers Using Extended Safety Levels , 2000 .

[11]  Lionel M. Ni,et al.  Fault-tolerant routing in hypercube multicomputers using local safety information , 1996 .

[12]  Jianer Chen,et al.  Routing in Hypercube Networks with a Constant Fraction of Faulty Nodes , 2001, J. Interconnect. Networks.

[13]  Philip Heidelberger,et al.  IBM Research Report Design and Analysis of the BlueGene/L Torus Interconnection Network , 2003 .