Spare processor allocation for fault tolerance in torus-based multicomputers

Some fault-tolerant architectures use the spare nodes or links to replace the faulty components. This paper gives solutions to spare processor placement problem for torus based networks. Optimal 1-hop spare processor placement methods for multi-dimensional tori and t-hop placement methods for 2D tori are described. In the presence of node failures, a system reconfiguration scheme using spare nodes is also given.

[1]  S. P. Lloyd Binary block coding , 1957 .

[2]  John P. Hayes,et al.  An automorphic approach to the design of fault-tolerant multiprocessors , 1989, [1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[3]  Rami G. Melhem,et al.  Channel Multiplexing in Fault-Tolerant Modular Multiprocessors , 1995, J. Parallel Distributed Comput..

[4]  Elwyn R. Berlekamp,et al.  Algebraic coding theory , 1984, McGraw-Hill series in systems science.

[5]  Stephen Y. H. Su,et al.  Reconfiguration of VLSI/WSI Mesh Array Processors with Two-Level Redundancy , 1989, IEEE Trans. Computers.

[6]  Prithviraj Banerjee,et al.  Design and Evaluation of Hardware Strategies for Reconfiguring Hypercubes and Meshes Under Faults , 1994, IEEE Trans. Computers.

[7]  C. Y. Lee,et al.  Some properties of nonbinary error-correcting codes , 1958, IRE Trans. Inf. Theory.

[8]  John P. Hayes,et al.  A Graph Model for Fault-Tolerant Computing Systems , 1976, IEEE Transactions on Computers.

[9]  Jehoshua Bruck,et al.  Tolerating Faults in Hypercubes Using Subcube Partitioning , 1992, IEEE Trans. Computers.

[10]  Rami G. Melhem,et al.  Routing in modular fault tolerant multiprocessor systems , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.

[11]  William J. Dally,et al.  The J-machine Multicomputer: An Architectural Evaluation , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[12]  Jehoshua Bruck,et al.  Fault-Tolerant Meshes and Hypercubes with Minimal Numbers of Spares , 1993, IEEE Trans. Computers.

[13]  John P. Hayes,et al.  Some Practical Issues in the Design of Fault-Tolerant Multiprocessors , 1992, IEEE Trans. Computers.

[14]  M. Livingston,et al.  Distributing resources in hypercube computers , 1988, C3P.

[15]  Thomas Kailath,et al.  Reconfiguring Processor Arrays Using Multiple-Track Models: The 3-Track-1-Spare-Approach , 1993, IEEE Trans. Computers.

[16]  Myung M. Bae,et al.  Resource placement in torus-based networks , 1996, Proceedings of International Conference on Parallel Processing.

[17]  F. Leighton,et al.  Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes , 1991 .

[18]  Yaagoub Ashir,et al.  Lee Distance and Topological Properties of k-ary n-cubes , 1995, IEEE Trans. Computers.

[19]  Charles L. Seitz Submicron Systems Architecture Project: Semiannual Technical Report , 1989 .

[20]  Alain J. Martin,et al.  The architecture and programming of the Ametek series 2010 multicomputer , 1988, C3P.

[21]  Cauligi S. Raghavendra,et al.  Free Dimensions-An Effective Approach to Achieving Fault Tolerance in Hypercubes , 1995, IEEE Trans. Computers.

[22]  Sun-Yuan Kung,et al.  Fault-Tolerant Array Processors Using Single-Track Switches , 1989, IEEE Trans. Computers.

[23]  Robert Richard Broeg Topics in toroidal interconnection networks , 1996 .