Online Adaptive Fault-Tolerant Routing in 2D Torus

In this paper, we propose efficient routing algorithms for 2D torus with possible large number of faulty nodes. There is no presumption on the number and the distribution of faulty nodes. The proposed algorithms find a fault-free path between any two nonfaulty nodes with high probability in linear time by using only the local routing information of the network. The results of our empirical analysis through simulations show that the algorithms can find a fault-free path between any two nonfaulty nodes with high probability. For example, in a torus of size up to 128×128, where, the number of faulty nodes up to 15%, the heuristuc-square routing algorithm finds a fault-free path with a probability of 90% or higher. The experimental results are impressive for 2D torus with only four links per node.

[1]  Jie Wu,et al.  Fault-Tolerant Broadcasting in 2-D Wormhole-Routed Meshes , 2003, The Journal of Supercomputing.

[2]  Dong Xiang Fault-Tolerant Routing in Hypercube Multicomputers Using Local Safety Information , 2001, IEEE Trans. Parallel Distributed Syst..

[3]  Valentin Puente,et al.  Immunet: a cheap and robust fault-tolerant packet routing mechanism , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[4]  Shietung Peng,et al.  Fault tolerant routing in toroidal networks , 1995, Proceedings the First Aizu International Symposium on Parallel Algorithms/Architecture Synthesis.

[5]  S. Chalasani,et al.  Adaptive wormhole routing in tori with faults , 1995 .

[6]  Suresh Chalasani,et al.  Fault-tolerant wormhole routing in tori , 1994, ICS '94.

[7]  Lionel M. Ni,et al.  Fault-tolerant routing in hypercube multicomputers using local safety information , 1996 .

[8]  Jianer Chen,et al.  Routing in Hypercube Networks with a Constant Fraction of Faulty Nodes , 2001, J. Interconnect. Networks.

[9]  Jianer Chen,et al.  Locally Subcube-Connected Hypercube Networks: Theoretical Analysis and Experimental Results , 2002, IEEE Trans. Computers.

[10]  L. D. Aronson Homogeneous Routing for Homogeneous Traffic Patterns on Meshes , 2000, IEEE Trans. Parallel Distributed Syst..

[11]  Shietung Peng,et al.  Unicast in Hypercubes with Large Number of Faulty Nodes , 1999, IEEE Trans. Parallel Distributed Syst..

[12]  Suresh Chalasani,et al.  Fault-Tolerant Wormhole Routing Algorithms for Mesh Networks , 1995, IEEE Trans. Computers.

[13]  WuJie Fault-Tolerant Adaptive and Minimal Routing in Mesh-Connected Multicomputers Using Extended Safety Levels , 2000 .

[14]  Sheng-De Wang,et al.  Adaptive and Deadlock-Free Routing for Irregular Faulty Patterns in Mesh Multicomputers , 2000, IEEE Trans. Parallel Distributed Syst..

[15]  Shubhendu S. Mukherjee,et al.  The Alpha 21364 Network Architecture , 2002, IEEE Micro.