Computing global combine operations in the multi-port postal model

Consider a message-passing system of n processors and each holds one piece of data initially. The goal is to compute an associative and commutative reduction function on the n distributed pieces of data and to make the result known to all the processors. We model the message-passing system using the parameter k for the k-port model and the parameter /spl lambda/ for the communication latency in the postal model. In this general model, each processor during each round r can send messages to any set of k processors and receive messages from any other set of k processors, which were set out during round r-/spl lambda/+1, provided r-/spl lambda/+1/spl ges/1 We describe an optimal algorithm that requires the least number of communication rounds and minimizes the time spent by each processor in sending an receiving messages.<<ETX>>

[1]  Jehoshua Bruck,et al.  Efficient algorithms for all-to-all communications in multi-port message-passing systems , 1994, SPAA '94.

[2]  Jehoshua Bruck,et al.  Efficient Global Combine Operations in Multi-Port Message-Passing Systems , 1993, Parallel Process. Lett..

[3]  Rolf Hempel,et al.  The ANL/GMD Macros (PARMACS) in FORTRAN for Portable Parallel Programming using the Message Passing , 1991 .

[4]  S. Lennart Johnsson,et al.  Optimum Broadcasting and Personalized Communication in Hypercubes , 1989, IEEE Trans. Computers.

[5]  G. C. Fox,et al.  Solving Problems on Concurrent Processors , 1988 .

[6]  Amotz Bar-Noy,et al.  Multiple message broadcasting in the postal model , 1993, [1993] Proceedings Seventh International Parallel Processing Symposium.

[7]  Israel Cidon,et al.  Paris: An approach to integrated high‐speed private networks , 1988 .

[8]  Walter Knödel,et al.  New gossips and telephones , 1975, Discret. Math..

[9]  Mark A. Johnson,et al.  Solving problems on concurrent processors. Vol. 1: General techniques and regular problems , 1988 .

[10]  Arthur L. Liestman,et al.  A survey of gossiping and broadcasting in communication networks , 1988, Networks.

[11]  R. van de Geijn Efficient Global Combine Operations , 1991, The Sixth Distributed Memory Computing Conference, 1991. Proceedings.

[12]  Quentin F. Stout,et al.  Intensive Hypercube Communication. Prearranged Communication in Link-Bound Machines , 1990, J. Parallel Distributed Comput..

[13]  Jehoshua Bruck,et al.  Multiple message broadcasting with generalized Fibonacci trees , 1992, [1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing.

[14]  Vasanth Bala,et al.  Process Groups: a mechanism for the coordination of and communication among processes in the Venus collective communication library , 1993, [1993] Proceedings Seventh International Parallel Processing Symposium.

[15]  Baruch Schieber,et al.  An Optimal Algorithm for computing Census Functions in Message-Passing Systems , 1993, Parallel Process. Lett..

[16]  Shay Kutten,et al.  New models and algorithms for future networks , 1988, PODC '88.

[17]  Robert A. van de Geijn,et al.  Global combine on mesh architectures with wormhole routing , 1993, [1993] Proceedings Seventh International Parallel Processing Symposium.

[18]  Jehoshua Bruck,et al.  CCL: a portable and tunable collective communication library for scalable parallel computers , 1994, Proceedings of 8th International Parallel Processing Symposium.

[19]  Baruch Schieber,et al.  optimal Computation of Census Functions in the Postal Model , 1995, Discret. Appl. Math..

[20]  S. Louis Hakimi,et al.  Sequential information dissemination by packets , 1992, Networks.

[21]  Amotz Bar-Noy,et al.  Broadcasting multiple messages in simultaneous send/receive systems , 1993, Proceedings of 1993 5th IEEE Symposium on Parallel and Distributed Processing.

[22]  Anthony Skjellum,et al.  A Portable Multicomputer Communication Library atop the Reactive Kernel , 1990, Proceedings of the Fifth Distributed Memory Computing Conference, 1990..

[23]  Ramesh Subramonian,et al.  LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[24]  Dennis G. Shea,et al.  Architecture and implementation of Vulcan , 1994, Proceedings of 8th International Parallel Processing Symposium.

[25]  W. David Sincoskie,et al.  The AURORA Gigabit Testbed , 1993, Comput. Networks ISDN Syst..

[26]  R. A. van de Geijn,et al.  Efficient Global Combine Operations , 1991 .

[27]  Richard M. Karp,et al.  Optimal broadcast and summation in the LogP model , 1993, SPAA '93.

[28]  Andrew A. Chien,et al.  The J-Machine: A Fine Grain Concurrent Computer , 1989 .

[29]  G. A. Geist,et al.  A user's guide to PICL a portable instrumented communication library , 1990 .

[30]  Document for a Standard Message-Passing Interface , 1993 .

[31]  Jehoshua Bruck,et al.  The IBM External User Interface for Scalable Parallel Systems , 1994, Parallel Comput..

[32]  Richard P. Martin,et al.  Fast parallel sorting under logp: from theory to practice , 1993 .

[33]  Amotz Bar-Noy,et al.  Designing broadcasting algorithms in the postal model for message-passing systems , 1992, SPAA '92.