Many-to-many personalized communication with bounded traffic

This paper presents solutions for the problem of many-to-many personalized communication, with bounded incoming and outgoing traffic, on a distributed memory parallel machine. We present a two-stage algorithm that decomposes the many-to-many communication with possibly high variance in message size into two communications with low message size variance. The algorithm is deterministic and takes time 2t/spl mu/(+lower order terms) when t/spl ges/0(p/sup 2/+p/spl tau///spl mu/) Here t is the maximum outgoing or incoming traffic at any processor, /spl tau/ is the startup overhead and /spl mu/ is the inverse of the data transfer rate. Optimality is achieved when the traffic is large, a condition that is usually satisfied in practice on coarse-grained architectures. The algorithm was implemented on the Connection Machine CM-5. The implementation used the low latency communication primitives (active messages) available on the CM-5, but the algorithm as such is architecture-independent. An alternate single-stage algorithm using distributed random scheduling for the CM-5 was implemented and the performance of the two algorithms were compared.<<ETX>>