A new performance measure for characterizing fault rings in interconnection networks

One of the fundamental issues in parallel computers is how to efficiently perform routing in a faulty network where each component fails with some probability. Adaptive fault-tolerant routing algorithms in such systems have been frequently suggested as a means of providing continuous operations in the presence of one or more failures by allowing the graceful system degradation. Many algorithms involve adding buffer space and complex control logic to the routing nodes. However, the addition of extra logic circuits and buffer space makes nodes more liable to failure and less reliable. Further, if the shape of fault pattern is confined, then many non-faulty nodes will be sacrificed and hence their resources are wasted. This is clearly an undesirable solution and motivates solutions that provoke efficient use of non-faulty nodes. One such approach to reducing the number of functional nodes that must be marked as faulty is based on the concept of fault rings to support more flexible routing around rectangular fault regions. Before such schemes can be successfully incorporated in networks, it is necessary to have a clear understanding of the factors that affect their performance potential. In this paper, we propose the first general solution for computing the probability of message facing the fault rings with and without overlapping in the well-known torus networks. We also conduct extensive simulation experiments using various fault patterns, the results of which are used to confirm the good accuracy of the proposed analytical models.

[1]  Shahram Latifi,et al.  Robustness of star graph network under link failure , 2008, Inf. Sci..

[2]  John B. Fraleigh A first course in abstract algebra , 1967 .

[3]  Sudhakar Yalamanchili,et al.  Interconnection Networks: An Engineering Approach , 2002 .

[4]  Kenneth H. Rosen,et al.  Discrete Mathematics and its applications , 2000 .

[5]  Jipeng Zhou,et al.  Adaptive fault-tolerant wormhole routing with two virtual channels in 2D meshes , 2004, 7th International Symposium on Parallel Architectures, Algorithms and Networks, 2004. Proceedings..

[6]  Young-Joo Suh,et al.  Software-Based Rerouting for Fault-Tolerant Pipelined Communication , 2000, IEEE Trans. Parallel Distributed Syst..

[7]  Sudhakar Yalamanchili,et al.  Dynamically Configurable Message Flow Control for Fault-Tolerant Routing , 1999, IEEE Trans. Parallel Distributed Syst..

[8]  Suresh Chalasani,et al.  Fault-Tolerant Wormhole Routing Algorithms for Mesh Networks , 1995, IEEE Trans. Computers.

[9]  Partha Pratim Pande,et al.  Performance evaluation and design trade-offs for network-on-chip interconnect architectures , 2005, IEEE Transactions on Computers.

[10]  Jamal N. Al-Karaki Performance analysis of repairable cluster of workstations , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[11]  Salman A. Khan,et al.  Design and analysis of a fault tolerant hybrid mobile scheme , 2007, Inf. Sci..

[12]  Laxmikant V. Kalé,et al.  A fault tolerant protocol for massively parallel systems , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[13]  Jong-Hoon Youn,et al.  Fault-tolerant wormhole routing algorithms in meshes in the presence of concave faults , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.

[14]  Saad Bani-Mohammad,et al.  An efficient non-contiguous processor allocation strategy for 2D mesh connected multicomputers , 2007, Inf. Sci..

[15]  Hamid Sarbazi-Azad Performance analysis of wormhole routing in multicomputer interconnection networks , 2001 .

[16]  J.-D. Shih Fault-tolerant wormhole routing in torus networks with overlapped block faults , 2003 .

[17]  Ge-Ming Chiu,et al.  A Fault-Tolerant Routing Scheme for Meshes with Nonconvex Faults , 2001, IEEE Trans. Parallel Distributed Syst..

[18]  Huaxi Gu,et al.  A New Routing Method to Tolerate both Convex and Concave , 2005, Sixth International Conference on Parallel and Distributed Computing Applications and Technologies (PDCAT'05).

[19]  S. Chalasani,et al.  Adaptive wormhole routing in tori with faults , 1995 .

[20]  Mohamed F. Younis,et al.  Fault-tolerant clustering of wireless sensor networks , 2003, 2003 IEEE Wireless Communications and Networking, 2003. WCNC 2003..

[21]  Andrew A. Chien,et al.  Planar-adaptive routing: low-cost adaptive networks for multiprocessors , 1992, ISCA '92.

[22]  Chita R. Das,et al.  Fault-Tolerant Routing in Mesh Networks , 1995, International Conference on Parallel Processing.

[23]  Junming Xu Topological Structure and Analysis of Interconnection Networks , 2002, Network Theory and Applications.

[24]  David W. Lewis,et al.  Matrix theory , 1991 .

[25]  José Duato A Theory of Fault-Tolerant Routing in Wormhole Networks , 1997, IEEE Trans. Parallel Distributed Syst..