Deadlock-Free Dynamic Reconfiguration Schemes for Increased Network Dependability

Network-based parallel computing systems often require the ability to reconfigure the routing algorithm to reflect changes in network topology if and when voluntary or involuntary changes occur. The process of reconfiguring a network's routing capabilities may be very inefficient and/or deadlock-prone if not handled properly. We propose efficient and deadlock-free dynamic reconfiguration schemes that are applicable to routing algorithms and networks which use wormhole, virtual cut-through, or store-and-forward switching, combined with hard link-level flow control. One requirement is that the network architecture use virtual channels or duplicate physical channels for deadlock-handling as well as performance purposes. The proposed schemes do not impede the injection, transmission, or delivery of user packets during the reconfiguration process. Instead, they provide uninterrupted service, increased availability/reliability, and improved overall quality-of-service support as compared to traditional techniques based on static reconfiguration.

[1]  José Duato,et al.  Fast dynamic reconfiguration in irregular networks , 2000, Proceedings 2000 International Conference on Parallel Processing.

[2]  Shubhendu S. Mukherjee,et al.  The Alpha 21364 Network Architecture , 2002, IEEE Micro.

[3]  Lionel M. Ni,et al.  Fault-tolerant wormhole routing in meshes , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[4]  José Duato,et al.  A Protocol for Deadlock-Free Dynamic Reconfiguration in High-Speed Local Area Networks , 2001, IEEE Trans. Parallel Distributed Syst..

[5]  Timothy Mark Pinkston,et al.  A Formal Model of Message Blocking and Deadlock Resolution in Interconnection Networks , 2000, IEEE Trans. Parallel Distributed Syst..

[6]  Lionel M. Ni,et al.  The turn model for adaptive routing , 1992, ISCA '92.

[7]  José Duato,et al.  A New Theory of Deadlock-Free Adaptive Routing in Wormhole Networks , 1993, IEEE Trans. Parallel Distributed Syst..

[8]  William J. Dally,et al.  Deadlock-Free Message Routing in Multiprocessor Interconnection Networks , 1987, IEEE Transactions on Computers.

[9]  José Duato A Necessary and Sufficient Condition for Deadlock-Free Adaptive Routing in Wormhole Networks , 1995, IEEE Trans. Parallel Distributed Syst..

[10]  Yuval Tamir,et al.  Dynamically-Allocated Multi-Queue Buffers for VLSI Communication Switches , 1992, IEEE Trans. Computers.

[11]  José Duato,et al.  A General Theory for Deadlock-Free Adaptive Routing Using a Mixed Set of Resources , 2001, IEEE Trans. Parallel Distributed Syst..

[12]  Steven L. Scott,et al.  The Cray T3E Network: Adaptive Routing in a High Performance 3D Torus , 1996 .

[13]  M. Rosenblum,et al.  Hardware Fault Containment In Scalable Shared-memory Multiprocessors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[14]  Leonard Kleinrock,et al.  Virtual Cut-Through: A New Computer Communication Switching Technique , 1979, Comput. Networks.

[15]  Timothy Mark Pinkston Flexible and Efficient Routing Based on Progressive Deadlock Recovery , 1999, IEEE Trans. Computers.

[16]  Michael D. Schroeder,et al.  Automatic reconfiguration in Autonet , 1991, SOSP '91.

[17]  Dimiter R. Avresky,et al.  Dynamic reconfiguration in high-speed computer clusters , 2001, Proceedings 42nd IEEE Symposium on Foundations of Computer Science.

[18]  Timothy Mark Pinkston,et al.  Characterization of Deadlocks in k-ary n-Cube Networks , 1999, IEEE Trans. Parallel Distributed Syst..

[19]  Young-Joo Suh,et al.  Software Based Fault-Tolerant Oblivious Routing in Pipelined Networks , 1995, ICPP.

[20]  Sudhakar Yalamanchili,et al.  Interconnection Networks: An Engineering Approach , 2002 .

[21]  Charles L. Seitz,et al.  Myrinet: A Gigabit-per-Second Local Area Network , 1995, IEEE Micro.

[22]  Michael Burrows,et al.  Autonet: A High-Speed, Self-Configuring Local Area Network Using Point-to-Point Links , 1991, IEEE J. Sel. Areas Commun..

[23]  Antonio Robles,et al.  Improving performance of networks of workstations by using Disha Concurrent , 1998, Proceedings. 1998 International Conference on Parallel Processing (Cat. No.98EX205).

[24]  José Duato A Theory of Fault-Tolerant Routing in Wormhole Networks , 1997, IEEE Trans. Parallel Distributed Syst..