Efficient Broadcast and Multicast on Multistage Interconnection Networks Using Multiport Encoding

This paper proposes anew approach for implementing fast multicast and broadcast in unidirectional and bidirectional multistage interconnection networks (MINs) with multiport encoded multidestination worms. For a MIN with n stages, such worms use n header flits each. One flit is used for each stage of the network and it indicates the output ports to which a multicast message needs to be replicated. A multiport encoded worm with (d/sub 1/, d/sub 2/..., d/sub n/, 1/spl les/d/sub i//spl les/k) degrees of replication for the respective stages is capable of covering (d/sub 1//spl times/d/sub x//spl times/.../spl times/d/sub n/) destinations with a single communication start-up. In this paper, a switch architecture is proposed for implementing multidestination worms without deadlock. Three grouping algorithms of varying complexity are presented to derive the associated multiport encoded worms for a multicast to an arbitrary set of destinations. Using these worms, a multinomial tree-based scheme is proposed to implement the multicast. This scheme significantly reduces broadcast/multicast latency compared to schemes using unicast messages. Simulation studies for both unidirectional and bidirectional MIN systems indicate that improvement in broadcast/multicast latency up to a factor of four is feasible using the new approach. Interestingly, this approach is able to implement multicast with reduced latency as the number of destinations increases beyond a certain number. Compared to implementing unicast messages, this approach requires little additional logic at the switches. Thus, the scheme demonstrates significant potential for implementing efficient collective communication operations on current and future MIN-based systems.

[1]  Hong Zu,et al.  Optimal software multicast in wormhole-routed multistage networks , 1994, Proceedings of Supercomputing '94.

[2]  Xiaola Lin,et al.  Deadlock-free multicast wormhole routing in multicomputer networks , 1991, ISCA '91.

[3]  B. Duzett,et al.  An overview of the nCUBE 3 supercomputer , 1992, [Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation.

[4]  Donald J. Patterson,et al.  Computer organization and design: the hardware-software interface (appendix a , 1993 .

[5]  Dhabaleswar K. Panda,et al.  Multidestination Message Passing in Wormhole k-ary n-cube Networks with Base Routing Conformed Paths , 1999, IEEE Trans. Parallel Distributed Syst..

[6]  Jehoshua Bruck,et al.  CCL: a portable and tunable collective communication library for scalable parallel computers , 1994, Proceedings of 8th International Parallel Processing Symposium.

[7]  Dhabaleswar K. Panda,et al.  Implementing multidestination worms in switch-based parallel systems: architectural alternatives and their impact , 1997, ISCA '97.

[8]  Hong Xu,et al.  Unicast-Based Multicast Communication in Wormhole-Routed Networks , 1994, IEEE Trans. Parallel Distributed Syst..

[9]  Craig B. Stunkel,et al.  The SP1 high-performance switch , 1994, Proceedings of IEEE Scalable High Performance Computing Conference.

[10]  Dennis G. Shea,et al.  The SP2 High-Performance Switch , 1995, IBM Syst. J..

[11]  W. Daniel Hillis,et al.  The Network Architecture of the Connection Machine CM-5 , 1996, J. Parallel Distributed Comput..

[12]  Isaac D. Scherson,et al.  Least common ancestor networks , 1993, [1993] Proceedings Seventh International Parallel Processing Symposium.

[13]  Xiaola Lin,et al.  Performance Evaluation of Multicast Wormhole Routing in 2D-Mesh Multicomputers , 1991, ICPP.

[14]  Chita R. Das,et al.  A Queuing Model for Finite-Buffered Multistage Interconnection Networks , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[15]  David A. Patterson,et al.  Computer Organization & Design: The Hardware/Software Interface , 1993 .

[16]  Philip K. McKinley,et al.  Collective Communication in Wormhole-Routed Massively Parallel Computers , 1995, Computer.

[17]  William J. Dally,et al.  Deadlock-Free Message Routing in Multiprocessor Interconnection Networks , 1987, IEEE Transactions on Computers.

[18]  Lionel M. Ni,et al.  A survey of wormhole routing techniques in direct networks , 1993, Computer.

[19]  José Duato A Theory of Deadlock-Free Adaptive Multicast Routing in Wormhole Networks , 1995, IEEE Trans. Parallel Distributed Syst..

[20]  Robert J. McMillen,et al.  The Multistage Cube: A Versatile Interconnection Network , 1981, Computer.

[21]  Dhabaleswar K. Panda,et al.  A reliable hardware barrier synchronization scheme , 1997, Proceedings 11th International Parallel Processing Symposium.

[22]  Leonard Kleinrock,et al.  Virtual Cut-Through: A New Computer Communication Switching Technique , 1979, Comput. Networks.

[23]  Craig B. Stunkel,et al.  Adaptive source routing in multistage interconnection networks , 1996, Proceedings of International Conference on Parallel Processing.

[24]  Jon Beecroft,et al.  Meiko CS-2 Interconnect Elan-Elite Design , 1994, Parallel Comput..

[25]  Robert A. van de Geijn,et al.  Optimal Broadcasting in Mesh-Connected Architectures , 1991 .

[26]  Dhabaleswar K. Panda,et al.  Multidestination Message Passing Mechanism Conforming to Base Wormhole Routing Scheme , 1994, PCRCW.

[27]  Debashis Basak,et al.  Simulation of modern parallel systems: a CSIM-based approach , 1997, WSC '97.

[28]  Lionel M. Ni,et al.  Efficient software multicast in wormhole-routed unidirectional multistage networks , 1995, Proceedings.Seventh IEEE Symposium on Parallel and Distributed Processing.

[29]  Jehoshua Bruck,et al.  Multiple message broadcasting with generalized Fibonacci trees , 1992, [1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing.

[30]  Eugene D. Brooks,et al.  A Scalable Coherent Cache System With Incomplete Directory State , 1990, ICPP.

[31]  Dennis G. Shea,et al.  Architecture and implementation of Vulcan , 1994, Proceedings of 8th International Parallel Processing Symposium.

[32]  S. Konstantinidou,et al.  Chaos router: architecture and performance , 1991, [1991] Proceedings. The 18th Annual International Symposium on Computer Architecture.

[33]  W. Daniel Hillis,et al.  The network architecture of the Connection Machine CM-5 (extended abstract) , 1992, SPAA '92.

[34]  S. Lennart Johnsson,et al.  Optimum Broadcasting and Personalized Communication in Hypercubes , 1989, IEEE Trans. Computers.

[35]  Howard Jay Siegel,et al.  Using the multistage cube network topology in parallel supercomputers , 1989 .