Fault-tolerance improvement of planar adaptive routing based on detailed traffic analysis

Currently, some coarse measures like global network latency are used to compare routing protocols. These measures do not provide enough insight of traffic distribution among network nodes in the presence of different fault regions. This paper presents a detailed traffic analysis of fault-tolerant planar adaptive routing (FTPAR) algorithm achieved by an especially developed tool. Per-node traffic analysis illustrates the traffic hotspots caused by fault regions and provides a great assistance in developing fault tolerant routing algorithms. Based on such detailed information, a simple yet effective improvement of FTPAR is suggested. Moreover, the effect of a traffic hotspot on the traffic of neighboring nodes and global performance degradation is investigated. To analyze the per-node traffic, some per-node traffic metrics are introduced and one of them is selected for the rest of work. In an effort to gain deep understanding of the issue of traffic analysis of faulty networks, this paper is the first attempt to investigate per-node traffic around fault regions.

[1]  Hamid Sarbazi-Azad,et al.  Improving a Fault-Tolerant Routing Algorithm Using Detailed Traffic Analysis , 2007, HPCC.

[2]  Luca Benini,et al.  Error control schemes for on-chip communication links: the energy-reliability tradeoff , 2005, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[3]  Larry J. Stockmeyer,et al.  A new approach to fault-tolerant wormhole routing for mesh-connected parallel computers , 2004, Proceedings 16th International Parallel and Distributed Processing Symposium.

[4]  Cruz Izu Throughput fairness in k-ary n-cube networks , 2006, ACSC.

[5]  Jie Wu,et al.  Fault-tolerant and deadlock-free routing in 2-D meshes using rectilinear-monotone polygonal fault blocks , 2005, Parallel Algorithms Appl..

[6]  T. Dumitras,et al.  Towards on-chip fault-tolerant communication , 2003, Proceedings of the ASP-DAC Asia and South Pacific Design Automation Conference, 2003..

[7]  Ming-Jer Tsai,et al.  Fault-Tolerant Routing in Wormhole Meshes , 2003, J. Interconnect. Networks.

[8]  Jie Wu Fault-Tolerant Adaptive and Minimal Routing in Mesh-Connected Multicomputers Using Extended Safety Levels , 2000, IEEE Trans. Parallel Distributed Syst..

[9]  Daniel Wiklund Development and performance evaluation of networks on chip , 2005 .

[10]  Radu Marculescu,et al.  Towards on-chip fault-tolerant communication , 2003, ASP-DAC '03.

[11]  Vara Varavithya,et al.  An EfficientFault-Tolerant Routing Scheme for Two Dimensional Meshes , 1995 .

[12]  Jie Wu,et al.  A simple fault-tolerant adaptive and minimal routing approach in 3-D meshes , 2008, Journal of Computer Science and Technology.

[13]  Jie Wu,et al.  A Fault-Tolerant and Deadlock-Free Routing Protocol in 2D Meshes Based on Odd-Even Turn Model , 2003, IEEE Trans. Computers.

[14]  Mahmut T. Kandemir,et al.  Fault tolerant algorithms for network-on-chip interconnect , 2004, IEEE Computer Society Annual Symposium on VLSI.

[15]  Antonio Robles,et al.  A New Adaptive Fault-Tolerant Routing Methodology for Direct Networks , 2004, HiPC.

[16]  Hamid Sarbazi-Azad,et al.  XMulator: A Listener-Based Integrated Simulation Platform for Interconnection Networks , 2007, First Asia International Conference on Modelling & Simulation (AMS'07).

[17]  Johnny Öberg,et al.  Reducing power and latency in 2-D mesh NoCs using globally pseudochronous locally synchronous clocking , 2004, International Conference on Hardware/Software Codesign and System Synthesis, 2004. CODES + ISSS 2004..

[18]  Suresh Chalasani,et al.  Fault-Tolerant Wormhole Routing Algorithms for Mesh Networks , 1995, IEEE Trans. Computers.

[19]  Kresimir Mihic,et al.  Reliability and power management of integrated systems , 2004 .

[20]  Andrew A. Chien,et al.  Planar-adaptive routing: low-cost adaptive networks for multiprocessors , 1992, ISCA '92.

[21]  Sheng-De Wang,et al.  An Improved Algorithm for Fault-Tolerant Routing in Hypercubes , 1997, IEEE Trans. Computers.

[22]  WuJie A Fault-Tolerant and Deadlock-Free Routing Protocol in 2D Meshes Based on Odd-Even Turn Model , 2003 .

[23]  Suresh Chalasani,et al.  Communication in Multicomputers with Nonconvex Faults , 1997, IEEE Trans. Computers.