Analytical modeling and comparison of fault-tolerant message flow control mechanisms in torus-connected networks

In many environments, rather than minimizing message latency or maximizing network performance, the ability to survive beyond the failure of individual network components is the main issue of interests. The nature of Wormhole Switching (WS) leads to high network throughput and low message latencies. However, in the vicinity of faulty regions, these behaviors cause rapid congestion, provoking the network becomes deadlocked. While techniques such as adaptive routing can alleviate the problem, they cannot completely solve the problem. Thus, there have been extreme studies on other types of switching mechanisms in networking and multicomputers communities. In this paper, we present a general mathematical model to assess the relative performance merits of three well-known fault-tolerant switching methods in tori, namely Scouting Switching (SS), Pipelined Circuit Switching (PCS), and Circuit Switching (CS). We have carried out extensive simulation experiments, the results of which are used to validate the proposed analytical models. We have also conducted an extensive comparative performance analysis, by means of analytical modeling, of SS, PCS, and CS under various working conditions. The analytical results reveal that SS shows substantial performance improvements for low to moderate failure rates over PCS and CS, which achieves close to WS performance. PCS can provide superior performance over CS and behaves the same or in some occasions worse than SS, under light and moderate traffic, especially with the same hardware requirements.

[1]  John Riordan,et al.  Introduction to Combinatorial Analysis , 1959 .

[2]  Young-Joo Suh,et al.  Software-Based Rerouting for Fault-Tolerant Pipelined Communication , 2000, IEEE Trans. Parallel Distributed Syst..

[3]  Hamid R. Arabnia,et al.  A Parallel Algorithm for the Arbitrary Rotation of Digitized Images Using Process-and-Data-Decomposition Approach , 1990, J. Parallel Distributed Comput..

[4]  Carl E. Landwehr,et al.  Basic concepts and taxonomy of dependable and secure computing , 2004, IEEE Transactions on Dependable and Secure Computing.

[5]  Mohamed Ould-Khaoua,et al.  A Performance Model for Duato's Fully Adaptive Routing Algorithm in k-Ary n-Cubes , 1999, IEEE Trans. Computers.

[6]  Vara Varavithya,et al.  Routing Algorithms for Torus Networks , 1995 .

[7]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[8]  Chita R. Das,et al.  Efficient fully adaptive wormhole routing in n-dimensional meshes , 1994, 14th International Conference on Distributed Computing Systems.

[9]  Mahmood Fathy,et al.  Performance analysis of fault-tolerant routing algorithm in wormhole-switched interconnections , 2007, The Journal of Supercomputing.

[10]  Jie Wu,et al.  On constructing the minimum orthogonal convex polygon for the fault-tolerant routing in 2-D faulty meshes , 2005, IEEE Transactions on Reliability.

[11]  John Riordan,et al.  Introduction to Combinatorial Analysis , 1958 .

[12]  Emmanouel A. Varvarigos,et al.  Circuit Switching with Input Queuing: An Analysis for the d-Dimensional Wraparound Mesh and the Hypercube , 1997, IEEE Trans. Parallel Distributed Syst..

[13]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM '01.

[14]  Leonard Kleinrock,et al.  Theory, Volume 1, Queueing Systems , 1975 .

[15]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[16]  Reza Moraveji,et al.  On the Probability of Facing Fault Patterns: A Performance and Comparison Measure of Network Fault-Tolerance , 2008, ICCS.

[17]  José Duato,et al.  A New Theory of Deadlock-Free Adaptive Routing in Wormhole Networks , 1993, IEEE Trans. Parallel Distributed Syst..

[18]  Junming Xu Topological Structure and Analysis of Interconnection Networks , 2002, Network Theory and Applications.

[19]  Ronald L. Rivest,et al.  Introduction to Algorithms, third edition , 2009 .

[20]  Anant Agarwal,et al.  APRIL: a processor architecture for multiprocessing , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[21]  Mahmood Fathy,et al.  Characterization of spatial fault patterns in interconnection networks , 2006, Parallel Comput..

[22]  Sudhakar Yalamanchili,et al.  Dynamically Configurable Message Flow Control for Fault-Tolerant Routing , 1999, IEEE Trans. Parallel Distributed Syst..

[23]  William J. Dally Virtual-channel flow control , 1990, ISCA '90.

[24]  Pablo Molinero-Fernández,et al.  The performance of circuit switching in the internet , 2002, CCRV.

[25]  Geyong Min Performance modelling and analysis of multicomputer interconnection networks , 2003 .

[26]  Hamid R. Arabnia,et al.  Distributed stereo-correlation algorithm , 1996, Comput. Commun..

[27]  Sudhakar Yalamanchili,et al.  Interconnection Networks: An Engineering Approach , 2002 .

[28]  Bruno Ciciani,et al.  Performance Analysis of Circuit-Switching Interconnection Networks with Deterministic and Adaptive Routing , 1998, Perform. Evaluation.

[29]  W. Feller,et al.  An Introduction to Probability Theory and Its Applications, Vol. 1 , 1967 .

[30]  Xin Yuan,et al.  Performance of Multi-hop Communications Using Logical Topologies on Optical Torus Networks , 2001, J. Parallel Distributed Comput..

[31]  Anant Agarwal,et al.  Limits on Interconnection Network Performance , 1991, IEEE Trans. Parallel Distributed Syst..

[32]  Sudhakar Yalamanchili,et al.  Pipelined circuit-switching: a fault-tolerant variant of wormhole routing , 1992, [1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing.

[33]  D. Banerjee,et al.  The multidimensional torus: analysis of average hop distance and application as a multihop lightwave network , 1994, Proceedings of ICC/SUPERCOMM'94 - 1994 International Conference on Communications.

[34]  Joydeep Ghosh,et al.  A Comprehensive Analytical Model for Wormhole Routng in Multicomputer Systems , 1994, J. Parallel Distributed Comput..

[35]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1951 .

[36]  Chita R. Das,et al.  A Class of Partially Adaptive Routing Algorithms for n_dimensional Meshes , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[37]  William J. Dally,et al.  Performance Analysis of k-Ary n-Cube Interconnection Networks , 1987, IEEE Trans. Computers.

[38]  Hamid R. Arabnia,et al.  Parallel Edge-Region-Based Segmentation Algorithm Targeted at Reconfigurable MultiRing Network , 2003, The Journal of Supercomputing.

[39]  H.R. Arabnia,et al.  A Transputer Network for Fast Operations on Digitised Images , 1989, Comput. Graph. Forum.

[40]  Hamid R. Arabnia,et al.  A distributed stereocorrelation algorithm , 1995, Proceedings of Fourth International Conference on Computer Communications and Networks - IC3N'95.

[41]  김도훈,et al.  고속 UWB SoC의 MAC 시스템 설계 , 2011 .

[42]  Andrew A. Chien,et al.  Planar-adaptive routing: low-cost adaptive networks for multiprocessors , 1992, ISCA '92.