Computing Global Combine Operations in the Multiport Postal Model

Consider a message-passing system of n processors, in which each processor holds one piece of data initially. The goal is to compute an associative and commutative reduction function on the n pieces of data and to make the result known to all the n processors. This operation is frequently used in many message-passing systems and is typically referred to as global combine, census computation, or gossiping. This paper explores the problem of global combine in the multiport postal model. This model is characterized by three parameters: n-the number of processors, k-the number of ports per processor, and /spl lambda/-the communication latency. In this model, in every round r, each processor can send k distinct messages to k other processors, and it can receive k messages that were sent from k other processors /spl lambda/-1 rounds earlier. This paper provides an optimal algorithm for the global combine problem that requires the least number of communication rounds and minimizes the time spent by any processor in sending and receiving messages. >

[1]  G. C. Fox,et al.  Solving Problems on Concurrent Processors , 1988 .

[2]  Israel Cidon,et al.  Paris: An approach to integrated high‐speed private networks , 1988 .

[3]  Arthur L. Liestman,et al.  A survey of gossiping and broadcasting in communication networks , 1988, Networks.

[4]  S. Lennart Johnsson,et al.  Optimum Broadcasting and Personalized Communication in Hypercubes , 1989, IEEE Trans. Computers.

[5]  Quentin F. Stout,et al.  Intensive Hypercube Communication. Prearranged Communication in Link-Bound Machines , 1990, J. Parallel Distributed Comput..

[6]  R. A. van de Geijn,et al.  Efficient Global Combine Operations , 1991 .

[7]  Amotz Bar-Noy,et al.  Designing broadcasting algorithms in the postal model for message-passing systems , 1992, SPAA '92.

[8]  W. Daniel Hillis,et al.  The network architecture of the Connection Machine CM-5 (extended abstract) , 1992, SPAA '92.

[9]  Jehoshua Bruck,et al.  Multiple message broadcasting with generalized Fibonacci trees , 1992, [1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing.

[10]  S. Louis Hakimi,et al.  Sequential information dissemination by packets , 1992, Networks.

[11]  W. David Sincoskie,et al.  The AURORA Gigabit Testbed , 1993, Comput. Networks ISDN Syst..

[12]  Robert A. van de Geijn,et al.  Global combine on mesh architectures with wormhole routing , 1993, [1993] Proceedings Seventh International Parallel Processing Symposium.

[13]  Baruch Schieber,et al.  An Optimal Algorithm for computing Census Functions in Message-Passing Systems , 1993, Parallel Process. Lett..

[14]  Ramesh Subramonian,et al.  LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[15]  Amotz Bar-Noy,et al.  Broadcasting multiple messages in simultaneous send/receive systems , 1993, Proceedings of 1993 5th IEEE Symposium on Parallel and Distributed Processing.

[16]  Richard M. Karp,et al.  Optimal broadcast and summation in the LogP model , 1993, SPAA '93.

[17]  Jehoshua Bruck,et al.  Efficient Global Combine Operations in Multi-Port Message-Passing Systems , 1993, Parallel Process. Lett..

[18]  Amotz Bar-Noy,et al.  Multiple message broadcasting in the postal model , 1993, [1993] Proceedings Seventh International Parallel Processing Symposium.

[19]  Jehoshua Bruck,et al.  Efficient algorithms for all-to-all communications in multi-port message-passing systems , 1994, SPAA '94.

[20]  Dennis G. Shea,et al.  Architecture and implementation of Vulcan , 1994, Proceedings of 8th International Parallel Processing Symposium.

[21]  Jehoshua Bruck,et al.  The IBM External User Interface for Scalable Parallel Systems , 1994, Parallel Comput..

[22]  Jehoshua Bruck,et al.  CCL: a portable and tunable collective communication library for scalable parallel computers , 1994, Proceedings of 8th International Parallel Processing Symposium.

[23]  Jehoshua Bruck,et al.  On the design and implementation of broadcast and global combine operations using the postal model , 1994, Proceedings of 1994 6th IEEE Symposium on Parallel and Distributed Processing.

[24]  Baruch Schieber,et al.  optimal Computation of Census Functions in the Postal Model , 1995, Discret. Appl. Math..

[25]  W. Daniel Hillis,et al.  The Network Architecture of the Connection Machine CM-5 , 1996, J. Parallel Distributed Comput..