Collective Communication Optimization with Dynamic Programming in Heterogeneous Cluster Environments

Network of workstation (NOW) is a cost-effective alternative to massively parallel supercomputers. However, the heterogeneity complicates the design of efficient collective communication protocols. Nevertheless, we demonstrate that a complex reduction scheduling problem can be transformed into a tree path length optimization problem and we develop dynamic programming techniques to solve this problem in pseudo polynomial time. In addition, we develop innovative techniques to further reduce the time complexity to O(fs log s). The same technique can be applied to any cluster with a fixed number of classes of processors, and a sender-receiver model in which the transmission cost is determined by both the sender and the receiver.

[1]  David A. Patterson,et al.  A case for networks of workstations (now) , 1994, Symposium Record Hot Interconnects II.

[2]  Jehoshua Bruck,et al.  Efficient message passing interface (MPI) for parallel computing on clusters of workstations , 1995, SPAA '95.

[3]  Anthony Skjellum,et al.  A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard , 1996, Parallel Comput..

[4]  Dhabaleswar K. Panda,et al.  Multicast on irregular switch-based networks with wormhole routing , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.

[5]  Dhabaleswar K. Panda,et al.  Efficient collective communication on heterogeneous networks of workstations , 1998, Proceedings. 1998 International Conference on Parallel Processing (Cat. No.98EX205).

[6]  Pangfeng Liu,et al.  Broadcast scheduling optimization for heterogeneous cluster systems , 2000, SPAA '00.

[7]  Da-Wei Wang,et al.  Reduction optimization in heterogeneous cluster environments , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.

[8]  Ran Libeskind-Hadas,et al.  Efficient multicast in heterogeneous networks of workstations , 2000, Proceedings 2000. International Workshop on Parallel Processing.

[9]  Viktor K. Prasanna,et al.  Efficient collective communication in distributed heterogeneous systems , 2003, J. Parallel Distributed Comput..

[10]  Da-Wei Wang,et al.  An Approximation Algorithm for Broadcast Scheduling in Heterogeneous Clusters , 2003, RTCSA.