A Theory of Fault-Tolerant Routing in Wormhole Networks

Fault-tolerant systems aim at providing continuous operation in the presence of faults. Multicomputers rely on an interconnection network between processors to support the message-passing mechanism. Therefore, the reliability of the interconnection network is very important for the reliability of the whole system. This paper analyzes the effective redundancy available in a wormhole network by combining connectivity and deadlock freedom. Redundancy is defined at the channel level. We propose a sufficient condition for channel redundancy, also computing the set of redundant channels. The redundancy level of the network is also defined, proposing a theorem that supplies its value. This theory is developed on top of our necessary and sufficient condition for deadlock-free adaptive routing. The new theory also considers the failure of physical channels when virtual channels are used. Finally, we propose a methodology for the design of fault-tolerant routing algorithms, showing its application to n-dimensional meshes.

[1]  José Duato A Theory to Increase the Effective Redundancy in Wormhole Networks , 1994, Parallel Process. Lett..

[2]  John P. Hayes,et al.  A Fault-Tolerant Communication Scheme for Hypercube Computers , 1992, IEEE Trans. Computers.

[3]  William J. Dally Virtual-channel flow control , 1990, ISCA '90.

[4]  Ran Libeskind-Hadas,et al.  Origin-based fault-tolerant routing in the mesh , 1995, Future Gener. Comput. Syst..

[5]  Suresh Chalasani,et al.  Fault-Tolerant Wormhole Routing Algorithms for Mesh Networks , 1995, IEEE Trans. Computers.

[6]  Suresh Chalasani,et al.  Fault-tolerant routing with non-adaptive wormhole algorithms in mesh networks , 1994, Proceedings of Supercomputing '94.

[7]  Luis Gravano,et al.  Requirements for deadlock-free, adaptive packet routing , 1992, PODC '92.

[8]  Lionel M. Ni,et al.  Fault-tolerant wormhole routing in meshes , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[9]  José Duato,et al.  Scouting: fully adaptive, deadlock-free routing in faulty pipelined networks , 1994, Proceedings of 1994 International Conference on Parallel and Distributed Systems.

[10]  William Yost,et al.  Design of a Router for Fault-Tolerant Networks , 1994, PCRCW.

[11]  Sudhakar Yalamanchili,et al.  Configurable flow control mechanisms for fault-tolerant routing , 1995, ISCA.

[12]  Anant Agarwal,et al.  Limits on Interconnection Network Performance , 1991, IEEE Trans. Parallel Distributed Syst..

[13]  José Duato,et al.  A theory of fault-tolerant routing in wormhole networks , 1994, Proceedings of 1994 International Conference on Parallel and Distributed Systems.

[14]  Sudhakar Yalamanchili,et al.  Pipelined circuit-switching: a fault-tolerant variant of wormhole routing , 1992, [1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing.

[15]  Sudhakar Yalamanchili,et al.  A Family of Fault-Tolerant Routing Protocols for Direct Multiprocessor Networks , 1995, IEEE Trans. Parallel Distributed Syst..

[16]  Parameswaran Ramanathan,et al.  Reliable Broadcast in Hypercube Multicomputers , 1988, IEEE Trans. Computers.

[17]  José Duato,et al.  A New Theory of Deadlock-Free Adaptive Routing in Wormhole Networks , 1993, IEEE Trans. Parallel Distributed Syst..

[18]  Sudhakar Yalamanchili,et al.  Ariadne/spl minus/an adaptive router for fault-tolerant multicomputers , 1994, Proceedings of 21 International Symposium on Computer Architecture.

[19]  José Duato,et al.  A Necessary and Sufficient Condition for Deadlock-Free Adaptive Routing in Wormhole Networks , 1994, 1994 International Conference on Parallel Processing Vol. 1.

[20]  Charles L. Seitz,et al.  Multicomputers: message-passing concurrent computers , 1988, Computer.

[21]  Young-Joo Suh,et al.  Software Based Fault-Tolerant Oblivious Routing in Pipelined Networks , 1995, ICPP.

[22]  José Duato,et al.  On the Design of Deadlock-Free Adaptive Routing Algorithms for Multicomputers: Design Methodologies , 1991, PARLE.

[23]  S. Louis Hakimi,et al.  Fault-Tolerant Routing in DeBruijn Comrnunication Networks , 1985, IEEE Transactions on Computers.

[24]  José Duato,et al.  A Necessary and Sufficient Condition for Deadlock-Free Routing in Cut-Through and Store-and-Forward Networks , 1996, IEEE Trans. Parallel Distributed Syst..

[25]  Daniel H. Linder,et al.  An Adaptive and Fault Tolerant Wormhole Routing Strategy for k-Ary n-Cubes , 1994, IEEE Trans. Computers.

[26]  Andrew A. Chien,et al.  Planar-adaptive routing: low-cost adaptive networks for multiprocessors , 1992, ISCA '92.

[27]  Cauligi S. Raghavendra,et al.  Free Dimensions-An Effective Approach to Achieving Fault Tolerance in Hypercubes , 1995, IEEE Trans. Computers.

[28]  Sudhakar Yalamanchili,et al.  Distributed Deadlock-Free Routing in Faulty, Pipelined, Direct Interconnection Networks , 1996, IEEE Trans. Computers.

[29]  William J. Dally,et al.  Deadlock-Free Adaptive Routing in Multicomputer Networks Using Virtual Channels , 1993, IEEE Trans. Parallel Distributed Syst..

[30]  Jae H. Kim,et al.  Compressionless Routing: a framework for adaptive and fault-tolerant routing , 1994, Proceedings of 21 International Symposium on Computer Architecture.

[31]  Pierre Fraigniaud,et al.  Fault-Tolerant Gossiping on Hypercube Multicomputers , 1991, EDMCC.

[32]  Suresh Chalasani,et al.  Fault-tolerant wormhole routing in tori , 1994, ICS '94.

[33]  William J. Dally,et al.  The Reliable Router: A Reliable and High-Performance Communication Substrate for Parallel Computers , 1994, PCRCW.

[34]  William J. Dally,et al.  Deadlock-Free Message Routing in Multiprocessor Interconnection Networks , 1987, IEEE Transactions on Computers.

[35]  Ming-Syan Chen,et al.  Adaptive Fault-Tolerant Routing in Hypercube Multicomputers , 1990, IEEE Trans. Computers.

[36]  Cauligi S. Raghavendra,et al.  Fault-Tolerant Networks Based on the de Bruijn Graph , 1991, IEEE Trans. Computers.

[37]  Dhiraj K. Pradhan,et al.  Dynamically Restructurable Fault-Tolerant Processor Network Architectures , 1985, IEEE Transactions on Computers.

[38]  Sudhakar Yalamanchili,et al.  Ariadne—an adaptive router for fault-tolerant multicomputers , 1994, ISCA '94.

[39]  Lionel M. Ni,et al.  The Turn Model for Adaptive Routing , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[40]  Ge-Ming Chiu,et al.  Fault-tolerant routing strategy in hypercube systems , 1994, Proceedings of IEEE 24th International Symposium on Fault- Tolerant Computing.

[41]  Cauligi S. Raghavendra,et al.  Free dimensions-an effective approach to achieving fault tolerance in hypercube , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.

[42]  Suresh Chalasani,et al.  A comparison of adaptive wormhole routing algorithms , 1993, ISCA '93.

[43]  Sudhakar M. Reddy,et al.  Fault-Tolerance Considerations in Large, Multiple-Processor Systems , 1986, Computer.

[44]  Ming-Syan Chen,et al.  Depth-First Search Approach for Fault-Tolerant Routing in Hypercube , 1990, IEEE Trans. Parallel Distributed Syst..