Fault-Tolerant Adaptive Deadlock-Recovery Routing for k-ary n-cube Networks

This paper proposes a fault-tolerant fully adaptive deadlock-recovery routing algorithm for k-ary n-cube networks. We intend to consider both the adaptability for faults and the communication performance by integrating regular and irregular network routing. Our algorithm tolerates any number or shape of faults without disabling fault-free nodes by maintaining routing tables that are configured based on faulty information. Our algorithm also provides minimal misrouting paths around faults by guaranteeing deadlock freedom using only two virtual channels per physical channel. Simulation results show that the proposed algorithm attains robust communication performance for uniform and nonuniform traffic patterns not only on a fault-free torus network but also on irregular tori with faulty nodes

[1]  Wu-chun Feng,et al.  The Quadrics Network: High-Performance Clustering Technology , 2002, IEEE Micro.

[2]  Shubhendu S. Mukherjee,et al.  The Alpha 21364 Network Architecture , 2002, IEEE Micro.

[3]  Timothy Mark Pinkston,et al.  An efficient, fully adaptive deadlock recovery scheme: DISHA , 1995, ISCA.

[4]  Tsutomu Yoshinaga,et al.  Design and evaluation of a fault-tolerant adaptive router for parallel computers , 2003, Innovative Architecture for Future Generation High-Performance Processors and Systems, 2003.

[5]  Daniel H. Linder,et al.  An Adaptive and Fault Tolerant Wormhole Routing Strategy for k-Ary n-Cubes , 1994, IEEE Trans. Computers.

[6]  Andrew A. Chien,et al.  Planar-adaptive routing: low-cost adaptive networks for multiprocessors , 1992, ISCA '92.

[7]  Michael Burrows,et al.  Autonet: A High-Speed, Self-Configuring Local Area Network Using Point-to-Point Links , 1991, IEEE J. Sel. Areas Commun..

[8]  Michael D. Schroeder,et al.  Automatic reconfiguration in Autonet , 1991, SOSP '91.

[9]  Sudhakar Yalamanchili,et al.  Interconnection Networks: An Engineering Approach , 2002 .

[10]  José Duato A Theory of Fault-Tolerant Routing in Wormhole Networks , 1997, IEEE Trans. Parallel Distributed Syst..

[11]  David F. Heidel,et al.  An Overview of the BlueGene/L Supercomputer , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[12]  T.M. Pinkston,et al.  On Deadlocks In Interconnection Networks , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[13]  Lionel M. Ni,et al.  The turn model for adaptive routing , 1992, ISCA '92.

[14]  José Duato,et al.  A New Theory of Deadlock-Free Adaptive Routing in Wormhole Networks , 1993, IEEE Trans. Parallel Distributed Syst..

[15]  Suresh Chalasani,et al.  Fault-Tolerant Wormhole Routing Algorithms for Mesh Networks , 1995, IEEE Trans. Computers.

[16]  Dajin Wang,et al.  A Rectilinear-Monotone Polygonal Fault Block Model for Fault-Tolerant Minimal Routing in Mesh , 2003, IEEE Trans. Computers.

[17]  William J. Dally,et al.  Virtual-channel flow control , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.