Designing Fault-Tolerant System Using Automorphisms

Abstract This paper presents a general theory for modeling and designing fault-tolerant multiprocessor systems in a systematic and efficient manner. We are concerned here with structural fault tolerance, defined as the ability to reconfigure around faults in order to preserve the interconnection structure of a multiprocessor. We represent multiprocessor systems by graphs whose node sets denote processors and whose edge sets denote dedicated interprocessor links. The fault-tolerant design and reconfiguration process of a multiprocessor is modeled by graph automorphisms. This automorphism-based methodology also models some important practical design features not previously addressed, including applicability to any multiprocessor structure and any number of faults. Low redundancy and efficient reconfigurability are also addressed. We apply our approach directly to a class of regular multiprocessor graphs termed circulant. For noncirculant graphs we give an algorithm to construct their circulant edge supergraphs efficiently. An application of the theory to the design of fault-tolerant hypercube multiprocessors is described. The resulting designs are shown to be far superior to those proposed in previous work.

[1]  John P. Hayes,et al.  A Microprocessor-based Hypercube Supercomputer , 1986, IEEE Micro.

[2]  Frank Thomson Leighton,et al.  A Framework for Solving VLSI Graph Layout Problems , 1983, J. Comput. Syst. Sci..

[3]  Sudhakar M. Reddy,et al.  Distributed fault-tolerance for large multiprocessor systems , 1980, ISCA '80.

[4]  John P. Hayes,et al.  Distributed Recovery in Fault-Tolerant Multiprocessor Networks , 1986, IEEE Transactions on Computers.

[5]  W. Daniel Hillis,et al.  The connection machine , 1985 .

[6]  Shantanu Dutt Designing and reconfiguring fault-tolerant multiprocessor systems , 1991 .

[7]  S. Louis Hakimi,et al.  Characterization of Connection Assignment of Diagnosable Systems , 1974, IEEE Transactions on Computers.

[8]  Ralph Tindell,et al.  Circulants and their connectivities , 1984, J. Graph Theory.

[9]  Arnold L. Rosenberg,et al.  The Diogenes Approach to Testable Fault-Tolerant Arrays of Processors , 1983, IEEE Transactions on Computers.

[10]  W. Kent Fuchs,et al.  Reconfigurable Cube-Connected Cycles Architectures , 1990, J. Parallel Distributed Comput..

[11]  V. Sós,et al.  Algebraic methods in graph theory , 1981 .

[12]  S. Toida,et al.  An optimal 2-FT realization of binary symmetric hierarchical tree systems , 1982, Networks.

[13]  H. Yap Some Topics in Graph Theory , 1986 .

[14]  Gregory F. Sullivan,et al.  A Polynomial Time Algorithm for Fault Diagnosability , 1984, FOCS.

[15]  James Turner Point-symmetric graphs with a prime number of points , 1967 .

[16]  W. Kent Fuchs,et al.  Reconfigurable Tree Architectures Using Subtree Oriented Fault Tolerance , 1987, IEEE Transactions on Computers.

[17]  A.L. Hopkins,et al.  FTMP—A highly reliable fault-tolerant multiprocess for aircraft , 1978, Proceedings of the IEEE.

[18]  M. H. Schultz,et al.  Topological properties of hypercubes , 1988, IEEE Trans. Computers.

[19]  GERNOT METZE,et al.  On the Connection Assignment Problem of Diagnosable Systems , 1967, IEEE Trans. Electron. Comput..

[20]  F. C. Piper,et al.  FINITE GROUPS OF AUTOMORPHISMS , 1974 .

[21]  John P. Hayes,et al.  On Designing and Reconfiguring k-Fault-Tolerant Tree Architectures , 1990, IEEE Trans. Computers.

[22]  John P. Hayes,et al.  A Graph Model for Fault-Tolerant Computing Systems , 1976, IEEE Transactions on Computers.

[23]  L. Babai Combinatorics: On the abstract group of automorphisms , 1981 .

[24]  Milos D. Ercegovac,et al.  Fault Tolerance in Binary Tree Architectures , 1984, IEEE Transactions on Computers.