On the Probability of Facing Fault Patterns: A Performance and Comparison Measure of Network Fault-Tolerance

An important issue in the design and deployment of interconnection networks is the issue of network fault-tolerance for various types of failures. In designing parallel processing using torus as the underlying interconnection topology as well as in designing real applications on such processors, the estimates of the network reliability and fault-tolerance are important in choosing the routing algorithms and predicting their performance in the presence of faulty nodes. Under node-failure model, the faulty nodes may coalesce into fault patterns, which classified into two major categories, i.e., convex (|-shaped, $\Box$-shaped) and concave (L-shaped, T-shaped, +-shaped, H-shaped, U-shaped) regions. In this correspondence, we propose the first solution for computing the probability of message facing the fault patterns in tori both for convex and concave regions that is verified using simulation experiments. Our approach works for any number of faults as long as the network remains connected. We use these models to measure the network faulttolerance that can be achieved by adaptive routings, and to assess the impact of various fault patterns on the performance of such networks.

[1]  Laxmikant V. Kalé,et al.  A fault tolerant protocol for massively parallel systems , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[2]  J.-D. Shih Fault-tolerant wormhole routing in torus networks with overlapped block faults , 2003 .

[3]  Ge-Ming Chiu,et al.  A Fault-Tolerant Routing Scheme for Meshes with Nonconvex Faults , 2001, IEEE Trans. Parallel Distributed Syst..

[4]  Antonio Robles,et al.  A routing methodology for achieving fault tolerance in direct networks , 2006, IEEE Transactions on Computers.

[5]  Mohamed F. Younis,et al.  Fault-tolerant clustering of wireless sensor networks , 2003, 2003 IEEE Wireless Communications and Networking, 2003. WCNC 2003..

[6]  Sudhakar Yalamanchili,et al.  Interconnection Networks: An Engineering Approach , 2002 .

[7]  Sudhakar Yalamanchili,et al.  Dynamically Configurable Message Flow Control for Fault-Tolerant Routing , 1999, IEEE Trans. Parallel Distributed Syst..

[8]  Junming Xu Topological Structure and Analysis of Interconnection Networks , 2002, Network Theory and Applications.

[9]  Pascal Lorenz,et al.  Networking - ICN 2005, 4th International Conference on Networking, ReunionIsland, France, April 17-21, 2005, Proceedings, Part I , 2005, ICN.

[10]  Jie Wu,et al.  On constructing the minimum orthogonal convex polygon in 2-D faulty meshes , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[11]  Mahmood Fathy,et al.  Characterization of spatial fault patterns in interconnection networks , 2006, Parallel Comput..

[12]  Djibo Karimou,et al.  A Fault-Tolerant Permutation Routing Algorithm in Mobile Ad-Hoc Networks , 2005, ICN.

[13]  Jie Wu,et al.  On constructing the minimum orthogonal convex polygon for the fault-tolerant routing in 2-D faulty meshes , 2005, IEEE Transactions on Reliability.

[14]  Partha Pratim Pande,et al.  Performance evaluation and design trade-offs for network-on-chip interconnect architectures , 2005, IEEE Transactions on Computers.

[15]  Young-Joo Suh,et al.  Software-Based Rerouting for Fault-Tolerant Pipelined Communication , 2000, IEEE Trans. Parallel Distributed Syst..

[16]  Jamal N. Al-Karaki Performance analysis of repairable cluster of workstations , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..