Enhanced Cluster k-Ary n-Cube, A Fault-Tolerant Multiprocessor

We present a strongly fault-tolerant design for the k-ary n-cube multiprocessor and examine its reconfigurability. Our design augments the k-ary n-cube with (k/j)/sup n/ spare nodes. Each set of j/sup n/ regular nodes is connected to a spare node and the spare nodes are interconnected as either a (k/j)-ary n-cube if j/spl ne/(k/2) or a hypercube of dimension n if j=k/2. Our approach utilizes the capabilities of the wave-switching communication modules of the spare nodes to tolerate a large number of faulty nodes. Both theoretical and experimental results are examined. Compared with other proposed schemes, our approach can tolerate significantly more faulty nodes with a low overhead and no performance degradation.

[1]  Joseph JáJá,et al.  An Introduction to Parallel Algorithms , 1992 .

[2]  C. Y. Lee An Algorithm for Path Connections and Its Applications , 1961, IRE Trans. Electron. Comput..

[3]  Prithviraj Banerjee Strategies for reconfiguring hypercubes under faults , 1990, [1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium.

[4]  Joe Brandenburg Technology advances in the Intel Paragon system , 1993, SPAA '93.

[5]  Itsuo Takanami,et al.  Fault-Tolerant Processor Arrays Based on the 1½-Track Switches with Flexible Spare Distributions , 2000, IEEE Trans. Computers.

[6]  Rami G. Melhem,et al.  An Efficient Modular Spare Allocation Scheme and Its Application to Fault Tolerant Binary Hypercubes , 1991, IEEE Trans. Parallel Distributed Syst..

[7]  Mariagiovanna Sami,et al.  Fault Tolerance Techniques for Array Structures Used in Supercomputing , 1986, Computer.

[8]  Nobuo Tsuda,et al.  Reconfigurable mesh-connected processor arrays using row-column bypassing and direct replacement , 2000, Proceedings International Symposium on Parallel Architectures, Algorithms and Networks. I-SPAN 2000.

[9]  Jack J. Dongarra,et al.  The quest for petascale computing , 2001, Comput. Sci. Eng..

[10]  Myung M. Bae,et al.  Spare processor allocation for fault tolerance in torus-based multicomputers , 1996, Proceedings of Annual Symposium on Fault Tolerant Computing.

[11]  Michael D. Noakes,et al.  The J-machine multicomputer: an architectural evaluation , 1993, ISCA '93.

[12]  Pedro López,et al.  Deadlock- and livelock-free routing protocols for wave switching , 1997, Proceedings 11th International Parallel Processing Symposium.

[13]  José A. B. Fortes,et al.  A taxonomy of reconfiguration techniques for fault-tolerant processor arrays , 1990, Computer.

[14]  Frank Harary,et al.  Graph Theory , 2016 .

[15]  Prithviraj Banerjee,et al.  Design and Evaluation of Hardware Strategies for Reconfiguring Hypercubes and Meshes Under Faults , 1994, IEEE Trans. Computers.

[16]  Jehoshua Bruck,et al.  Fault-tolerant meshes with small degree , 1993, SPAA '93.

[17]  Nobuo Tsuda Fault-Tolerant Processor Arrays Using Additional Bypass Linking Allocated by Graph-Node Coloring , 2000, IEEE Trans. Computers.

[18]  Shantanu Dutt,et al.  Fast polylog-time reconfiguration of structurally fault-tolerant multiprocessors , 1993, Proceedings of 1993 5th IEEE Symposium on Parallel and Distributed Processing.

[19]  Steven L. Scott,et al.  The Cray T3E Network: Adaptive Routing in a High Performance 3D Torus , 1996 .

[20]  Shantanu Dutt,et al.  Hardware-efficient and highly-reconfigurable 4- and 2-track fault-tolerant designs for mesh-connected multicomputers , 1996, Proceedings of Annual Symposium on Fault Tolerant Computing.

[21]  F. Leighton,et al.  Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes , 1991 .

[22]  Baback A. Izadi Design of fault-tolerant distributed memory multiprocessors , 1996 .

[23]  Adit D. Singh Interstitial Redundancy: An Area Efficient Fault Tolerance Scheme for Large Area VLSI Processor Arrays , 1988, IEEE Trans. Computers.

[24]  Suku Nair,et al.  Algorithm-Based Fault Tolerance on a Hypercube Multiprocessor , 1990, IEEE Trans. Computers.

[25]  Jehoshua Bruck,et al.  Wildcard dimensions, coding theory and fault-tolerant meshes and hypercubes , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[26]  Jehoshua Bruck,et al.  Efficient fault-tolerant mesh and hypercube architectures , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.

[27]  Rami G. Melhem,et al.  Channel Multiplexing in Modular Fault Tolerant Multiprocessors , 1991, ICPP.

[28]  Rami G. Melhem,et al.  Routing in Modular Fault-Tolerant Multiprocessor Systems , 1995, IEEE Trans. Parallel Distributed Syst..

[29]  Sreejit Chakravarty,et al.  A Unified Approach to Designing Fault-Tolerant Processor Ensembles , 1988, ICPP.

[30]  Nian-Feng Tzeng A Cube-Connected Cycles Architecture with High Reliability and Improved Performance , 1993, IEEE Trans. Computers.

[31]  John P. Hayes,et al.  Systematic Design of Fault-Tolerant Multiprocessors with Shared Buses , 1997, IEEE Trans. Computers.