Communication structures in fault-tolerant distributed systems

The impact of communication structures on the robustness and performance of distributed computing systems is discussed. Communication structures are categorized as (1) point-to-point, (2) bus, and (3) multistage switching. The most significant networks that have been proposed are discussed. Networks are compared with respect to (1) connectivity, (2) diameter, (3) average distance, (4) diameter in the presence of faults, (5) extensibility (ease of adding or deleting processors), and (6) ease of routing in the presence of faults. Some attempts directed at achieving near optimality for each of these six factors are discussed. A strategy for designing good networks with respect to all of these factors is suggested. Attempts to predict the real-time robustness of systems are reviewed and commented on and the power and consistency of metrics associated with such attempts are discussed. The impact of communication structures is discussed with respect to fundamental fault-tolerant objectives, such as diagnosis and disconnection of faulty units and establishment of a logical link between units not connected by a physical link. Current trends and suggested trends in communication network research are commented on. © 1993 by John Wiley & Sons, Inc.

[1]  V. Benes,et al.  Mathematical Theory of Connecting Networks and Telephone Traffic. , 1966 .

[2]  William McCuaig,et al.  A simple proof of Menger's theorem , 1984, J. Graph Theory.

[3]  Karl W. Doty Large Regular Interconnection Networks , 1982, ICDCS.

[4]  Charles Delorme,et al.  Large bipartite graphs with given degree and diameter , 1985, J. Graph Theory.

[5]  Dhiraj K. Pradhan,et al.  Flip-Trees: Fault-Tolerant Graphs with Wide Containers , 1988, IEEE Trans. Computers.

[6]  Charles Delorme,et al.  Large Graphs with Given Degree and Diameter - Part I , 1984, IEEE Trans. Computers.

[7]  Dhiraj K. Pradhan,et al.  Dynamically Restructurable Fault-Tolerant Processor Network Architectures , 1985, IEEE Transactions on Computers.

[8]  Cauligi S. Raghavendra,et al.  Fault-Tolerant Routing in Multistage Interconnection Networks , 1989, IEEE Trans. Computers.

[9]  John P. Hayes,et al.  A Graph Model for Fault-Tolerant Computing Systems , 1976, IEEE Transactions on Computers.

[10]  Cauligi S. Raghavendra,et al.  The Gamma network: A multiprocessor interconnection network with redundant paths , 1982, ISCA.

[11]  Yoko Usami Extremal graphs of diameter at most 6 after deleting any vertex , 1985, J. Graph Theory.

[12]  Marvin H. Solomon,et al.  Dense Trivalent Graphs for Processor Interconnection , 1982, IEEE Transactions on Computers.

[13]  V. G. Cerf,et al.  A lower bound on the average shortest path length in regular graphs , 1974, Networks.

[14]  D. C. Opferman,et al.  On a class of rearrangeable switching networks part I: Control algorithm , 1971 .

[15]  Gunnar E. Carlsson,et al.  Interconnection Networks Based on a Generalization of Cube-Connected Cycles , 1985, IEEE Transactions on Computers.

[16]  Dhiraj K. Pradhan,et al.  The hyper-deBruijn multiprocessor networks , 1991, [1991] Proceedings. 11th International Conference on Distributed Computing Systems.

[17]  Dhiraj K. Pradhan,et al.  Fault-Tolerant Multiprocessor Link and Bus Network Architectures , 1994, IEEE Transactions on Computers.

[18]  Ján Plesník,et al.  On the sum of all distances in a graph or digraph , 1984, J. Graph Theory.

[19]  Fan Chung Graham,et al.  Diameter bounds for altered graphs , 1984, J. Graph Theory.

[20]  R. Wilkov,et al.  Analysis and Design of Reliable Computer Networks , 1972, IEEE Trans. Commun..

[21]  Abdol-Hossein Esfahanian,et al.  Lower-bounds on the connectivities of a graph , 1985, J. Graph Theory.

[22]  Howard Jay Siegel,et al.  The Extra Stage Cube: A Fault-Tolerant Interconnection Network for Supersystems , 1982, IEEE Transactions on Computers.

[23]  Sudhakar M. Reddy,et al.  A Class of Graphs for Fault-Tolerant Processor Interconnections , 1984, ICDCS.

[24]  Tse-Yun Feng,et al.  Fault-Diagnosis for a Class of Multistage Interconnection Networks , 1981, IEEE Trans. Computers.

[25]  S. Louis Hakimi,et al.  Fault-Tolerant Routing in DeBruijn Comrnunication Networks , 1985, IEEE Transactions on Computers.

[26]  M. Watkins Connectivity of transitive graphs , 1970 .

[27]  Marvin H. Solomon,et al.  High Density Graphs for Processor Interconnection , 1981, Inf. Process. Lett..

[28]  S. Hakimi An Algorithm for Construction of the Least Vulnerable Communication Network or the Graph with the Maximum Connectivity , 1969 .

[29]  R. M. Storwick,et al.  Improved Construction Techniques for (d, k) Graphs , 1970, IEEE Transactions on Computers.

[30]  Charles Clos,et al.  A study of non-blocking switching networks , 1953 .

[31]  S. Louis Hakimi,et al.  Fault-tolerant routing in DeBruijn communication networks , 1994 .

[32]  Duncan H. Lawrie,et al.  A Class of Redundant Path Multistage Interconnection Networks , 1983, IEEE Transactions on Computers.

[33]  Arunabha Sen,et al.  On an Optimally Fault-Tolerant Multiprocessor Network Architecture , 1987, IEEE Transactions on Computers.

[34]  Sudhakar M. Reddy,et al.  Distributed fault-tolerance for large multiprocessor systems , 1980, ISCA '80.

[35]  M. R. Samatham,et al.  Correction to 'The De Bruijn multiprocessor network: a versatile parallel processing and sorting network for VLSI' , 1991 .

[36]  Dhiraj K. Pradhan,et al.  Dynamic Testing Strategy for Distributed Systems , 1989, IEEE Trans. Computers.

[37]  Dharma P. Agrawal,et al.  Testing and Fault Tolerance of Multistage Interconnection Networks , 1982, Computer.

[38]  Dharma P. Agrawal,et al.  Generalized Hypercube and Hyperbus Structures for a Computer Network , 1984, IEEE Transactions on Computers.

[39]  Cauligi S. Raghavendra,et al.  The Gamma network: A multiprocessor interconnection network with redundant paths , 1982, ISCA 1982.

[40]  Dhiraj K. Pradhan,et al.  Consensus With Dual Failure Modes , 1991, IEEE Trans. Parallel Distributed Syst..

[41]  Gerald M. Masson,et al.  Generalized multi-stage connection networks , 1972, Networks.

[42]  Franco P. Preparata,et al.  The cube-connected-cycles: A versatile network for parallel computation , 1979, 20th Annual Symposium on Foundations of Computer Science (sfcs 1979).

[43]  D. Grunwald,et al.  The Performance of Multicomputer Interconnection Networks , 1987, Computer.

[44]  Harold S. Stone,et al.  Parallel Processing with the Perfect Shuffle , 1971, IEEE Transactions on Computers.

[45]  Dhiraj K. Pradhan,et al.  The De Bruijn Multiprocessor Network: A Versatile Parallel Processing and Sorting Network for VLSI , 1989, IEEE Trans. Computers.

[46]  Chita R. Das,et al.  Dependability Evaluation of Multicomputer Networks , 1986, International Conference on Parallel Processing.

[47]  Ralph Tindell,et al.  Circulants and their connectivities , 1984, J. Graph Theory.