Distributed Deadlock-Free Routing in Faulty, Pipelined, Direct Interconnection Networks

This paper focuses on designing high performance pipelined networks that can operate in the presence of dynamic component failures. A general, rigorous framework for deadlock-free communication in faulty, pipelined networks is developed. A mechanism is also proposed for recovering from dynamic link and node failures. The recovery mechanism (1) is fully distributed, (2) does not require timeouts, (3) prevents fault-induced deadlock, and (4) is integrated into the virtual channel flow control mechanisms. This recovery mechanism is used to develop a new pipelined communication mechanism-acknowledged pipelined circuit-switching (APCS). This mechanism supports existing routing protocols that can tolerate a maximal number of static link failures, i.e., one less than the number of ports on a node. An implementation of a novel router architecture is described and the results of detailed flit level simulations are presented. Finally, the proposed recovery mechanism is shown to be applicable to existing adaptive wormhole routing protocols which are prone to deadlock in the presence of dynamic faults.

[1]  Lionel M. Ni,et al.  Fault-tolerant wormhole routing in meshes , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[2]  Suresh Chalasani,et al.  A comparison of adaptive wormhole routing algorithms , 1993, ISCA '93.

[3]  Lionel M. Ni,et al.  The turn model for adaptive routing , 1992, ISCA '92.

[4]  William J. Dally,et al.  Virtual-channel flow control , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[5]  William J. Dally,et al.  Deadlock-Free Adaptive Routing in Multicomputer Networks Using Virtual Channels , 1993, IEEE Trans. Parallel Distributed Syst..

[6]  José Duato A Theory to Increase the Effective Redundancy in Wormhole Networks , 1994, Parallel Process. Lett..

[7]  Sudhakar Yalamanchili,et al.  Ariadne/spl minus/an adaptive router for fault-tolerant multicomputers , 1994, Proceedings of 21 International Symposium on Computer Architecture.

[8]  Andrew A. Chien,et al.  Compressionless Routing: A Framework for Adaptive and Fault-Tolerant Routing , 1997, IEEE Trans. Parallel Distributed Syst..

[9]  José Duato,et al.  A New Theory of Deadlock-Free Adaptive Routing in Wormhole Networks , 1993, IEEE Trans. Parallel Distributed Syst..

[10]  C. R. Jesshope,et al.  High performance communications in processor networks , 1989, ISCA '89.

[11]  Chris R. Jesshope,et al.  High Performance Communications In Processor Networks , 1989, The 16th Annual International Symposium on Computer Architecture.

[12]  Xiaola Lin,et al.  Deadlock-free multicast wormhole routing in multicomputer networks , 1991, ISCA '91.

[13]  José Duato Deadlock-free adaptive routing algorithms for multicomputers: evaluation of a new algorithm , 1991, Proceedings of the Third IEEE Symposium on Parallel and Distributed Processing.

[14]  S. Yalamanchili,et al.  Analytical models of bandwidth allocation in pipelined k-ary n-cubes , 1993, [1993] Proceedings Seventh International Parallel Processing Symposium.

[15]  Joydeep Ghosh,et al.  Multipath E-cube algorithms (MECA) for adaptive wormhole routing and broadcasting in k-ary n-cubes , 1992, Proceedings Sixth International Parallel Processing Symposium.

[16]  Patrick Thomas Gaughan Design and analysis of fault-tolerant pipelined multicomputer networks , 1994 .

[17]  Sudhakar Yalamanchili,et al.  Pipelined circuit-switching: a fault-tolerant variant of wormhole routing , 1992, [1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing.

[18]  Sudhakar Yalamanchili,et al.  A Family of Fault-Tolerant Routing Protocols for Direct Multiprocessor Networks , 1995, IEEE Trans. Parallel Distributed Syst..

[19]  Luis Gravano,et al.  Adaptive Deadlock- and Livelock-Free Routing with All Minimal Paths in Torus Networks , 1994, IEEE Trans. Parallel Distributed Syst..

[20]  Domenico Ferrari,et al.  Computer Systems Performance Evaluation , 1978 .

[21]  Andrew A. Chien,et al.  Planar-adaptive routing: low-cost adaptive networks for multiprocessors , 1992, ISCA '92.

[22]  H. T. Kung,et al.  Supporting systolic and memory communication in iWarp , 1990, ISCA '90.

[23]  William J. Dally,et al.  Deadlock-Free Message Routing in Multiprocessor Interconnection Networks , 1987, IEEE Transactions on Computers.

[24]  Sudhakar Yalamanchili,et al.  Ariadne—an adaptive router for fault-tolerant multicomputers , 1994, ISCA '94.

[25]  Ming-Syan Chen,et al.  Depth-First Search Approach for Fault-Tolerant Routing in Hypercube , 1990, IEEE Trans. Parallel Distributed Syst..