Network performance under physical constraints

The performance of an interconnection network in a massively parallel architecture is subject to physical constraints whose impact needs to be re-evaluated from time to time. Fat-trees, and low dimensional cubes have raised a great interest in the scientific community in the last few years and are emerging standards in the design of interconnection networks for massively parallel computers. In this paper we compare the communication performance of these two classes of interconnection networks using a detailed simulation model. The comparison is made using a set of synthetic benchmarks, taking into account physical constraints, as pin and bandwidth limitations, and the router complexity. In our experiments we consider two networks with 256 nodes, a 16-ary 2-cube and 4-ary 4-tree.

[1]  William J. Dally Virtual-channel flow control , 1990, ISCA '90.

[2]  José Duato A Necessary and Sufficient Condition for Deadlock-Free Adaptive Routing in Wormhole Networks , 1995, IEEE Trans. Parallel Distributed Syst..

[3]  Z. George Mou Comparison of Multiprocessor Networks with the Same Cost , 1996, PDPTA.

[4]  David A. Bader,et al.  Parallel algorithms for personalized communication and sorting with an experimental study (extended abstract) , 1996, SPAA '96.

[5]  Anoop Gupta,et al.  The Stanford Dash multiprocessor , 1992, Computer.

[6]  Fabrizio Petrini,et al.  k-ary n-trees: high performance networks for massively parallel architectures , 1997, Proceedings 11th International Parallel Processing Symposium.

[7]  David A. Wood,et al.  Accuracy vs. performance in parallel simulation of interconnection networks , 1995, Proceedings of 9th International Parallel Processing Symposium.

[8]  Kai Hwang,et al.  Advanced computer architecture - parallelism, scalability, programmability , 1992 .

[9]  Steven Heller,et al.  Congestion-Free Routing on the CM-5 Data Router , 1994, PCRCW.

[10]  A. A. Chein,et al.  A cost and speed model for k-ary n-cube wormhole routers , 1998 .

[11]  Daniel A. Reed,et al.  Communication and computation performance of the CM-5 , 1993, Supercomputing '93. Proceedings.

[12]  Erik Hagersten,et al.  The Cache Coherence Protocol of the Data Diffusion Machine , 1989 .

[13]  T. T. Kwan,et al.  Communication and computation performance of the CM-5 , 1993, Supercomputing '93.

[14]  Steven L. Scott,et al.  The Cray T3E Network: Adaptive Routing in a High Performance 3D Torus , 1996 .

[15]  William J. Dally,et al.  Deadlock-Free Message Routing in Multiprocessor Interconnection Networks , 1987, IEEE Transactions on Computers.

[16]  David H. C. Du,et al.  Performance evaluation of the CM-5 interconnection network , 1993, Digest of Papers. Compcon Spring.

[17]  William J. Dally,et al.  Deadlock-Free Adaptive Routing in Multicomputer Networks Using Virtual Channels , 1993, IEEE Trans. Parallel Distributed Syst..

[18]  Paul W. A. Stallard,et al.  An evaluation study of a link-based data diffusion machine , 1994 .

[19]  Pedro López,et al.  Performance Evaluation of Adaptive Routing Algorithms for k-ary-n-cubes , 1994, PCRCW.

[20]  W. Daniel Hillis,et al.  The network architecture of the Connection Machine CM-5 (extended abstract) , 1992, SPAA '92.

[21]  Anant Agarwal,et al.  Limits on Interconnection Network Performance , 1991, IEEE Trans. Parallel Distributed Syst..

[22]  Fabrizio Petrini,et al.  Minimal adaptive routing with limited injection on Toroidal k-ary n-cubes , 1996, Supercomputing '96.

[23]  Fabrizio Petrini,et al.  SMART: A Simulator of Massive Architectures and Topologies , 1997, Euro-PDS.

[24]  José Duato,et al.  A New Theory of Deadlock-Free Adaptive Routing in Wormhole Networks , 1993, IEEE Trans. Parallel Distributed Syst..

[25]  José Duato,et al.  994 International Conference on Parallel Processing a Necessary and Sufficient Condition for Deadlock-free Adaptive Routing in Wormhole Networks , 2022 .

[26]  W. Daniel Hillis,et al.  The Network Architecture of the Connection Machine CM-5 , 1996, J. Parallel Distributed Comput..

[27]  Andrew A. Chien,et al.  A Cost and Speed Model for k-ary n-Cube Wormhole Routers , 1998, IEEE Trans. Parallel Distributed Syst..

[28]  Manuel P. Malumbres,et al.  Optimal Topology for Distributed Shared-Memory Multiprocessors: Hypercubes Again? , 1996, Euro-Par, Vol. I.

[29]  Lawrence Snyder,et al.  A Comparison of Input and Output Driven Routers , 1996, Euro-Par, Vol. I.

[30]  William J. Dally,et al.  Network and processor architecture for message-driven computers , 1990 .

[31]  Anant Agarwal,et al.  APRIL: a processor architecture for multiprocessing , 1990, ISCA '90.

[32]  Thorsten von Eicken,et al.  技術解説 IEEE Computer , 1999 .

[33]  F. Leighton,et al.  Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes , 1991 .

[34]  Charles E. Leiserson,et al.  Fat-trees: Universal networks for hardware-efficient supercomputing , 1985, IEEE Transactions on Computers.