Communication delay analysis of fault-tolerant pipelined circuit switching in torus

Large-scale parallel systems, Multiprocessors System-on-Chip (MP-SoCs), multicomputers, and cluster computers are often composed of hundreds or thousands of components (such as routers, channels and connectors) that collectively possess failure rates higher than what arise in the ordinary systems. One of the most important issues in the design of such systems is the development of the efficient fault-tolerant mechanisms that provide high throughput and low latency in communications to ensure that these systems will keep running in a degraded mode until the faulty components are repaired. Pipelined Circuit Switching (PCS) has been suggested as an efficient switching method for supporting inter-processor communications in networks due to its ability to preserve both communication performance and fault-tolerant demands in such systems. This paper presents a new mathematical model to investigate the effects of failures and capture the mean message latency in torus using PCS in the presence of faulty components. Simulation experiments confirm that the analytical model exhibits a good degree of accuracy under different working conditions.

[1]  Hamid Sarbazi-Azad,et al.  Modelling of pipelined circuit switching in multicomputer networks , 2000, Proceedings 8th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (Cat. No.PR00728).

[2]  Joydeep Ghosh,et al.  A Comprehensive Analytical Model for Wormhole Routng in Multicomputer Systems , 1994, J. Parallel Distributed Comput..

[3]  Anant Agarwal,et al.  Limits on Interconnection Network Performance , 1991, IEEE Trans. Parallel Distributed Syst..

[4]  Sudhakar Yalamanchili,et al.  A Family of Fault-Tolerant Routing Protocols for Direct Multiprocessor Networks , 1995, IEEE Trans. Parallel Distributed Syst..

[5]  Rada Y. Chirkova,et al.  Queuing Systems , 2018, Encyclopedia of Database Systems.

[6]  Sudhakar Yalamanchili,et al.  Ariadne—an adaptive router for fault-tolerant multicomputers , 1994, ISCA '94.

[7]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1967 .

[8]  Leonard Kleinrock,et al.  Theory, Volume 1, Queueing Systems , 1975 .

[9]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[10]  J D Littler,et al.  A PROOF OF THE QUEUING FORMULA , 1961 .

[11]  S. F. Nugent,et al.  The iPSC/2 direct-connect communications technology , 1988, C3P.

[12]  Geyong Min Performance modelling and analysis of multicomputer interconnection networks , 2003 .

[13]  Leonard Kleinrock,et al.  Power and deterministic rules of thumb for probabilistic problems in computer communications , 1979 .

[14]  James Sutton,et al.  iWarp: a 100-MOPS, LIW microprocessor for multicomputers , 1991, IEEE Micro.

[15]  Sudhakar Yalamanchili,et al.  MMR: a high-performance MultiMedia Router-architecture and design trade-offs , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[16]  Mohamed Ould-Khaoua,et al.  A Comparative Study of Switching Methods in Multicomputer Networks , 2004, The Journal of Supercomputing.

[17]  J. Little A Proof for the Queuing Formula: L = λW , 1961 .

[18]  W. Feller,et al.  An Introduction to Probability Theory and Its Applications, Vol. 1 , 1967 .

[19]  Mohamed Ould-Khaoua,et al.  A Performance Model for Duato's Fully Adaptive Routing Algorithm in k-Ary n-Cubes , 1999, IEEE Trans. Computers.

[20]  Charles L. Seitz,et al.  The cosmic cube , 1985, CACM.

[21]  William J. Dally Virtual-channel flow control , 1990, ISCA '90.

[22]  Sudhakar Yalamanchili,et al.  Interconnection Networks: An Engineering Approach , 2002 .

[23]  William J. Dally,et al.  Deadlock-Free Message Routing in Multiprocessor Interconnection Networks , 1987, IEEE Transactions on Computers.

[24]  R. E. Kessler,et al.  Cray T3D: a new dimension for Cray Research , 1993, Digest of Papers. Compcon Spring.

[25]  Feller William,et al.  An Introduction To Probability Theory And Its Applications , 1950 .

[26]  Frederic T. Chong,et al.  METRO: a router architecture for high-performance, short-haul routing networks , 1994, ISCA '94.