Optimal software multicast in wormhole-routed multistage networks

Multistage interconnection networks are a popular class of interconnection architecture for constructing scalable parallel computers (SPCs). The focus of the paper is on wormhole routed multistage networks supporting turnaround routing. Existing machines characterized by such a system model include the IBM SP-1, TMC CM-5, and Meiko CS-2. Efficient collective communication among processor nodes is critical to the performance of SPCs. A system level multicast service, in which the same message is delivered from a source node to an arbitrary number of destination nodes, is fundamental in supporting collective communication primitives including the application level broadcast, reduction, and barrier synchronization. The paper addresses how to efficiently implement multicast services in wormhole routed multistage networks, in the absence of hardware multicast support, by exploiting the properties of the switching technology. An optimal multicast algorithm is proposed. The results of implementations on a 64-node SP-1 show that the proposed algorithm significantly outperforms the application level broadcast primitives provided by currently existing collective communication libraries including the public domain MPI.<<ETX>>

[1]  Ming-Yang Kao,et al.  Efficient Broadcast on Hypercubes with Wormhole and E-Cube Routings , 1995, Parallel Process. Lett..

[2]  Alfred V. Aho,et al.  Data Structures and Algorithms , 1983 .

[3]  Dennis G. Shea,et al.  Architecture and implementation of Vulcan , 1994, Proceedings of 8th International Parallel Processing Symposium.

[4]  Message P Forum,et al.  MPI: A Message-Passing Interface Standard , 1994 .

[5]  Anthony Skjellum,et al.  Using MPI - portable parallel programming with the message-parsing interface , 1994 .

[6]  Nobuhiko Koike NEC Cenju-3: a microprocessor-based parallel computer , 1994, Proceedings of 8th International Parallel Processing Symposium.

[7]  Lionel M. Ni,et al.  Efficient implementation of barrier synchronization in wormhole-routed hypercube multicomputers , 1992 .

[8]  Lionel M. Ni,et al.  A survey of wormhole routing techniques in direct networks , 1993, Computer.

[9]  Jehoshua Bruck,et al.  Computing global combine operations in the multi-port postal model , 1993, Proceedings of 1993 5th IEEE Symposium on Parallel and Distributed Processing.

[10]  William Gropp,et al.  Users guide for the ANL IBM SP1 , 1994 .

[11]  P. K. McKinley,et al.  Efficient collective data distribution in all-port wormhole-routed hypercubes , 1993, Supercomputing '93.

[12]  W. Daniel Hillis,et al.  The network architecture of the Connection Machine CM-5 (extended abstract) , 1992, SPAA '92.

[13]  William Gropp,et al.  Skjellum using mpi: portable parallel programming with the message-passing interface , 1994 .

[14]  William Gropp,et al.  Users manual for the Chameleon Parallel Programming Tools , 1993 .

[15]  Charles E. Leiserson,et al.  Fat-trees: Universal networks for hardware-efficient supercomputing , 1985, IEEE Transactions on Computers.

[16]  Hong Xu,et al.  ComPaSS: A Communication Package for Scalable Software Design , 1994, J. Parallel Distributed Comput..

[17]  Hong Xu,et al.  Unicast-Based Multicast Communication in Wormhole-Routed Networks , 1994, IEEE Trans. Parallel Distributed Syst..