Communication in Multicomputers with Nonconvex Faults

A technique to enhance multicomputer routers for fault-tolerant routing with modest increase in routing complexity and resource requirements is described. This method handles solid faults in meshes, which includes all convex faults and many practical nonconvex faults, for example, faults in the shape of L or T. As examples of the proposed method, adaptive and nonadaptive fault-tolerant routing algorithms using four virtual channels per physical channel are described.

[1]  Charles L. Seitz,et al.  Concurrent VLSI Architectures , 1984, IEEE Transactions on Computers.

[2]  Andrew A. Chien,et al.  Planar-adaptive routing: low-cost adaptive networks for multiprocessors , 1992, ISCA '92.

[3]  William J. Dally,et al.  Deadlock-Free Adaptive Routing in Multicomputer Networks Using Virtual Channels , 1993, IEEE Trans. Parallel Distributed Syst..

[4]  K. Bolding,et al.  Overview of fault handling for the chaos router , 1991, [Proceedings] 1991 International Workshop on Defect and Fault Tolerance on VLSI Systems.

[5]  Michael D. Noakes,et al.  The J-machine multicomputer: an architectural evaluation , 1993, ISCA '93.

[6]  Lionel M. Ni,et al.  Fault-tolerant wormhole routing in meshes , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[7]  Charles L. Seitz Concurrent architectures , 1990 .

[8]  William J. Dally,et al.  Deadlock-Free Message Routing in Multiprocessor Interconnection Networks , 1987, IEEE Transactions on Computers.

[9]  José Duato,et al.  A New Theory of Deadlock-Free Adaptive Routing in Wormhole Networks , 1993, IEEE Trans. Parallel Distributed Syst..

[10]  Andrew A. Chien,et al.  Compressionless routing: a framework for adaptive and fault-tolerant routing , 1994, ISCA '94.

[11]  Suresh Chalasani,et al.  Fault-Tolerant Wormhole Routing Algorithms for Mesh Networks , 1995, IEEE Trans. Computers.

[12]  Suresh Chalasani,et al.  Adaptive fault-tolerant wormhole routing algorithms with low virtual channel requirements , 1994, Proceedings of the International Symposium on Parallel Architectures, Algorithms and Networks (ISPAN).

[13]  Donald Yeung,et al.  THE MIT ALEWIFE MACHINE: A LARGE-SCALE DISTRIBUTED-MEMORY MULTIPROCESSOR , 1991 .

[14]  Sudhakar Yalamanchili,et al.  A Family of Fault-Tolerant Routing Protocols for Direct Multiprocessor Networks , 1995, IEEE Trans. Parallel Distributed Syst..

[15]  Suresh Chalasani,et al.  Communication in Multicomputers with Nonconvex Faults , 1995, IEEE Trans. Computers.

[16]  Chita R. Das,et al.  Fault-Tolerant Routing in Mesh Networks , 1995, International Conference on Parallel Processing.

[17]  Young-Joo Suh,et al.  Software Based Fault-Tolerant Oblivious Routing in Pipelined Networks , 1995, ICPP.