A Probabilistic Characterization of Fault Rings in Adaptively-Routed Mesh Interconnection Networks

With increase in concern for reliability in the current and next generation of multiprocessors system-on-chip (MP-SoCs), multi-computers, cluster computers, and peer-to-peer communication networks, fault-tolerance has become an integral part of these systems. One of the fundamental issues regarding fault-tolerance is how to efficiently route a faulty network where each component is associated with some probability of failure. Adaptive fault-tolerant routing algorithms have been frequently suggested in the literature as means of improving communication performance and fault-tolerant demands in computer systems. Also, several results have been reported on usage of fault rings in providing detours to messages blocked by faults and in routing messages adaptively around the rectangular faulty regions. In order to analyze the performance of such routing schemes, one must investigate the characteristics of fault rings. In this paper, we derive mathematical expressions to compute the probability of message facing the fault rings in the well-known mesh interconnection network. We also conduct extensive simulation experiments using a variety of faults, the results of which are used to confirm the accuracy of the proposed models.

[1]  Young-Joo Suh,et al.  Software-Based Rerouting for Fault-Tolerant Pipelined Communication , 2000, IEEE Trans. Parallel Distributed Syst..

[2]  J.-D. Shih Fault-tolerant wormhole routing in torus networks with overlapped block faults , 2003 .

[3]  Junming Xu Topological Structure and Analysis of Interconnection Networks , 2002, Network Theory and Applications.

[4]  Jie Wu,et al.  On constructing the minimum orthogonal convex polygon for the fault-tolerant routing in 2-D faulty meshes , 2005, IEEE Transactions on Reliability.

[5]  Jong-Hoon Youn,et al.  Fault-tolerant wormhole routing algorithms in meshes in the presence of concave faults , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.

[6]  Huaxi Gu,et al.  A New Routing Method to Tolerate both Convex and Concave , 2005, Sixth International Conference on Parallel and Distributed Computing Applications and Technologies (PDCAT'05).

[7]  Mahmood Fathy,et al.  Characterization of spatial fault patterns in interconnection networks , 2006, Parallel Comput..

[8]  Valentin Puente,et al.  Immucube: Scalable Fault-Tolerant Routing for k-ary n-cube Networks , 2007, IEEE Transactions on Parallel and Distributed Systems.

[9]  Ahmad Khonsari,et al.  A new performance measure for characterizing fault rings in interconnection networks , 2010, Inf. Sci..

[10]  Suresh Chalasani,et al.  Fault-Tolerant Wormhole Routing Algorithms for Mesh Networks , 1995, IEEE Trans. Computers.

[11]  Chita R. Das,et al.  Fault-Tolerant Routing in Mesh Networks , 1995, International Conference on Parallel Processing.

[12]  Jie Wu,et al.  On constructing the minimum orthogonal convex polygon in 2-D faulty meshes , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[13]  Ge-Ming Chiu,et al.  A Fault-Tolerant Routing Scheme for Meshes with Nonconvex Faults , 2001, IEEE Trans. Parallel Distributed Syst..

[14]  Jipeng Zhou,et al.  Adaptive fault-tolerant wormhole routing with two virtual channels in 2D meshes , 2004, 7th International Symposium on Parallel Architectures, Algorithms and Networks, 2004. Proceedings..

[15]  Sudhakar Yalamanchili,et al.  Interconnection Networks: An Engineering Approach , 2002 .