Alleviating Consumption Channel Bottleneck in Wormhole-Routed k-ary n-Cube Systems

This paper identifies performance degradation in wormhole routed k-ary n-cube networks due to limited number of router-to-processor consumption channels at each node. Many recent research in wormhole routing have advocated the advantages of adaptive routing and virtual channel flow control schemes to deliver better network performance. This paper indicates that the advantages associated with these schemes cannot be realized with limited consumption capacity. To alleviate such performance bottlenecks, a new network interface design using multiple consumption channels is proposed. To match virtual multiplexing on network channels, we also propose each consumption channel to support multiple virtual consumption channels. The impact of message arrival rate at a node on the required number of consumption channels is studied analytically. It is shown that wormhole networks with higher routing adaptivity, dimensionality, degree of hot-spot traffic, and number of virtual lanes have to take advantage of multiple consumption channels to deliver better performance. The interplay between system topology, routing algorithm, number of virtual lanes, messaging overheads, and communication traffic is studied through simulation to derive the effective number of consumption channels required in a system. Using the ongoing technological trend, it is shown that wormhole-routed systems can use up to two-four consumption channels per node to deliver better system performance.

[1]  Andrew A. Chien,et al.  A comparison of architectural support for messaging in the TMC CM-5 and the Cray T3D , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[2]  Dennis G. Shea,et al.  Architecture and implementation of Vulcan , 1994, Proceedings of 8th International Parallel Processing Symposium.

[3]  Richard J. Enbody,et al.  Performance evaluation of mesh-connected wormhole-routed networks for interprocessor communication in multicomputers , 1990, Proceedings SUPERCOMPUTING '90.

[4]  Sivarama P. Dandamudi,et al.  Hot-Spot Contention in Binary Hypercube Networks , 1992, IEEE Trans. Computers.

[5]  Debashis Basak,et al.  Simulation of modern parallel systems: a CSIM-based approach , 1997, WSC '97.

[6]  Marc Snir,et al.  The Communication Software and Parallel Environment of the IBM SP2 , 1995, IBM Syst. J..

[7]  Lionel M. Ni,et al.  A survey of wormhole routing techniques in direct networks , 1993, Computer.

[8]  Anoop Gupta,et al.  The Stanford Dash multiprocessor , 1992, Computer.

[9]  G.D. Pifarre,et al.  Fully Adaptive Minimal Deadlock-Free Packet Routing in Hypercubes, Meshes, and other Networks: Algorithms and Simulations , 1994, IEEE Trans. Parallel Distributed Syst..

[10]  José Duato,et al.  On the Design of Deadlock-Free Adaptive Routing Algorithms for Multicomputers: Theoretical Aspects , 1991, EDMCC.

[11]  Xiaola Lin,et al.  Deadlock-Free Multicast Wormhole Routing in 2-D Mesh Multicomputers , 1994, IEEE Trans. Parallel Distributed Syst..

[12]  William J. Dally Virtual-Channel Flow Control , 1992, IEEE Trans. Parallel Distributed Syst..

[13]  Timothy Mark Pinkston,et al.  An efficient, fully adaptive deadlock recovery scheme: DISHA , 1995, ISCA.

[14]  Dhabaleswar K. Panda,et al.  Impact of multiple consumption channels on wormhole routed k-ary n-cube networks , 1993, [1993] Proceedings Seventh International Parallel Processing Symposium.

[15]  Daniel H. Linder,et al.  An Adaptive and Fault Tolerant Wormhole Routing Strategy for k-Ary n-Cubes , 1994, IEEE Trans. Computers.

[16]  Debashis Basak,et al.  Designing Clustered Multiprocessor Systems under Packaging and Technological Advancements , 1996, IEEE Trans. Parallel Distributed Syst..

[17]  Andrew A. Chien,et al.  Planar-adaptive routing: low-cost adaptive networks for multiprocessors , 1992, ISCA '92.

[18]  Seth Copen Goldstein,et al.  Active messages: a mechanism for integrating communication and computation , 1998, ISCA '98.

[19]  Krishnan Padmanabhan,et al.  Performance of the Direct Binary n-Cube Network for Multiprocessors , 1989, IEEE Trans. Computers.

[20]  Craig B. Stunkel,et al.  The SP1 high-performance switch , 1994, Proceedings of IEEE Scalable High Performance Computing Conference.

[21]  William J. Dally,et al.  Performance Analysis of k-Ary n-Cube Interconnection Networks , 1987, IEEE Trans. Computers.

[22]  Suresh Chalasani,et al.  A comparison of adaptive wormhole routing algorithms , 1993, ISCA '93.

[23]  Chita R. Das,et al.  Efficient fully adaptive wormhole routing in n-dimensional meshes , 1994, 14th International Conference on Distributed Computing Systems.

[24]  William J. Dally,et al.  Deadlock-Free Message Routing in Multiprocessor Interconnection Networks , 1987, IEEE Transactions on Computers.

[25]  José Duato,et al.  A New Theory of Deadlock-Free Adaptive Routing in Wormhole Networks , 1993, IEEE Trans. Parallel Distributed Syst..

[26]  Herb Schwetman,et al.  Using CSIM to model complex systems , 1988, 1988 Winter Simulation Conference Proceedings.

[27]  KaramchetiVijay,et al.  Software overhead in messaging layers , 1994 .

[28]  Prasant Mohapatra,et al.  Efficient and balanced adaptive routing in two-dimensional meshes , 1995, Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture.

[29]  Michael D. Noakes,et al.  The J-machine multicomputer: an architectural evaluation , 1993, ISCA '93.

[30]  Jack J. Dongarra,et al.  Message-Passing Performance of Various Computers , 1997, Concurr. Pract. Exp..

[31]  Anoop Gupta,et al.  The Stanford FLASH Multiprocessor , 1994, ISCA.

[32]  Dennis G. Shea,et al.  The SP2 High-Performance Switch , 1995, IBM Syst. J..

[33]  Derek L. Eager,et al.  The interaction between virtual channel flow control and adaptive routing in wormhole networks , 1994, ICS '94.

[34]  Gregory F. Pfister,et al.  “Hot spot” contention and combining in multistage interconnection networks , 1985, IEEE Transactions on Computers.

[35]  Suresh Chalasani,et al.  Fault-tolerant routing with non-adaptive wormhole algorithms in mesh networks , 1994, Proceedings of Supercomputing '94.

[36]  C. T. Howard Ho,et al.  Efficient Multi-Packet Multicast Algorithms on Meshes with Wormhole and Dimension-Ordered Routing , 1995, ICPP.

[37]  Anant Agarwal,et al.  Limits on Interconnection Network Performance , 1991, IEEE Trans. Parallel Distributed Syst..

[38]  Kai Li,et al.  Retrospective: virtual memory mapped network interface for the SHRIMP multicomputer , 1994, ISCA '98.

[39]  Sudhakar Yalamanchili,et al.  A Family of Fault-Tolerant Routing Protocols for Direct Multiprocessor Networks , 1995, IEEE Trans. Parallel Distributed Syst..

[40]  Andrew A. Chien,et al.  Software overhead in messaging layers: where does the time go? , 1994, ASPLOS VI.