Performance analysis of fault-tolerant routing algorithm in wormhole-switched interconnections

Abstract With nowadays popularity of large-scale parallel computers, Multiprocessors System-on-Chip (MP-SoCs), multicomputers, cluster computers and peer-to-peer communication networks, fault-tolerant routing becomes an important issue in developing these systems. Fault-tolerant routing algorithms in such systems aim at providing continuous operations in the presence of one or more failures by allowing the graceful degradation of system. The Software-Based fault-tolerant routing scheme has been suggested as an efficient routing algorithm to preserve both communication performance and fault-tolerant demands in parallel computer systems. To study network performance, a number of different analytical models for fault-free routing algorithms have been proposed in the past literature. However, there has not been reported any similar analytical model of fault-tolerant routing in the presence of faulty components. This paper presents a new analytical modeling approach for determining the effects of failures in wormhole-switched 2-D tori using the fault-tolerant Software-Based scheme. More specifically, we describe a general model to derive mathematical expressions to investigate the performance behavior of routing algorithms confronting convex (|-shaped, □-shaped) or concave (U-shaped, +-shaped, T-shaped, H-shaped) faulty regions. The model is validated through comprehensive simulation experiments for different types of failures.

[1]  Djibo Karimou,et al.  A Fault-Tolerant Permutation Routing Algorithm in Mobile Ad-Hoc Networks , 2005, ICN.

[2]  Laxmikant V. Kalé,et al.  A fault tolerant protocol for massively parallel systems , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[3]  Hamid Sarbazi-Azad Performance analysis of wormhole routing in multicomputer interconnection networks , 2001 .

[4]  Sudhakar Yalamanchili,et al.  Interconnection Networks: An Engineering Approach , 2002 .

[5]  Joydeep Ghosh,et al.  A Comprehensive Analytical Model for Wormhole Routng in Multicomputer Systems , 1994, J. Parallel Distributed Comput..

[6]  Mohamed Ould-Khaoua,et al.  A Performance Model for Duato's Fully Adaptive Routing Algorithm in k-Ary n-Cubes , 1999, IEEE Trans. Computers.

[7]  Krishnan Padmanabhan,et al.  Performance of the Direct Binary n-Cube Network for Multiprocessors , 1989, IEEE Trans. Computers.

[8]  Mahmood Fathy,et al.  A Performance Model of Fault-Tolerant Routing Algorithm in Interconnect Networks , 2006, International Conference on Computational Science.

[9]  Young-Joo Suh,et al.  Software-Based Rerouting for Fault-Tolerant Pipelined Communication , 2000, IEEE Trans. Parallel Distributed Syst..

[10]  Partha Pratim Pande,et al.  Performance evaluation and design trade-offs for network-on-chip interconnect architectures , 2005, IEEE Transactions on Computers.

[11]  Jamal N. Al-Karaki Performance analysis of repairable cluster of workstations , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[12]  William J. Dally,et al.  Virtual-channel flow control , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[13]  Mahmood Fathy,et al.  Characterization of spatial fault patterns in interconnection networks , 2006, Parallel Comput..

[14]  Anant Agarwal,et al.  Limits on Interconnection Network Performance , 1991, IEEE Trans. Parallel Distributed Syst..

[15]  José Duato,et al.  A New Theory of Deadlock-Free Adaptive Routing in Wormhole Networks , 1993, IEEE Trans. Parallel Distributed Syst..

[16]  Antonio Robles,et al.  A routing methodology for achieving fault tolerance in direct networks , 2006, IEEE Transactions on Computers.

[17]  Jie Wu,et al.  On constructing the minimum orthogonal convex polygon for the fault-tolerant routing in 2-D faulty meshes , 2005, IEEE Transactions on Reliability.

[18]  Leonard Kleinrock,et al.  Theory, Volume 1, Queueing Systems , 1975 .

[19]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[20]  Mahmood Fathy,et al.  On the Fault Patterns Properties in the Torus Networks , 2006, IEEE International Conference on Computer Systems and Applications, 2006..

[21]  Mohamed F. Younis,et al.  Fault-tolerant clustering of wireless sensor networks , 2003, 2003 IEEE Wireless Communications and Networking, 2003. WCNC 2003..