Efficient collective communication on heterogeneous networks of workstations

Networks of Workstations (NOW) have become an attractive alternative platform for high performance computing. Due to the commodity nature of workstations and interconnects and due to the multiplicity of vendors and platforms, the NOW environments are being gradually redefined as Heterogeneous Networks of Workstations (HNOW) environments. This paper presents a new framework for implementing collective communication operations (as defined by the Message Passing Interface (MPI) standard) efficiently for the emerging HNOW environments. We first classify different types of heterogeneity in HNOW and then focus on one important characteristic: communication capabilities of workstations. Taking this characteristic into account, we propose two new approaches Speed-Partitioned Ordered Chain (SPOC) and Fastest-Node First (FNF) to implement collective communication operations with reduced latency. We also investigate methods for deriving optimal trees for broadcast and multicast operations. Generating such trees is shown to be computationally intensive. It is shown that the FNF approach, in spite of its simplicity, can deliver performance within 1% of the performance of the optimal trees. Finally, these new approaches are compared with the approach used in the MPICH implementation on experimental as well as on simulated testbeds. On a 24-node existing HNOW environment with SGI workstations and ATM interconnection our approaches reduce the latency of broadcast and multicast operations by a factor of up to 3.5 compared to the approach used in the existing MPICH implementation. On a 64-node simulated testbed, our approaches can reduce the latency of broadcast and multicast operations by a factor of up to 4.5. Thus, these results demonstrate that there is significant potential for our approaches to be applied towards designing scalable collective communication libraries for current and future generation HNOW environments.

[1]  Henri E. Bal,et al.  MagPIe: MPI's collective communication operations for clustered wide area systems , 1999, PPoPP '99.

[2]  Gregory T. Byrd,et al.  Multicast Communication in Multiprocessor Systems , 1989, ICPP.

[3]  Debashis Basak,et al.  Simulation of modern parallel systems: a CSIM-based approach , 1997, WSC '97.

[4]  George Karypis,et al.  Introduction to Parallel Computing , 1994 .

[5]  Dhabaleswar K. PandaDept Issues in Designing Eecient and Practical Algorithms for Collective Communication on Wormhole-routed Systems , 1995 .

[6]  C. C. Huang,et al.  Multicast virtual topologies for collective communication in MPCs and ATM clusters , 1995 .

[7]  Philip K. McKinley,et al.  Communication issues in parallel computing across ATM networks , 1994, IEEE Parallel & Distributed Technology: Systems & Applications.

[8]  Dhabaleswar K. Panda Fast barrier synchronization in wormhole k-ary n-cube networks with multidestination worms , 1995, Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture.

[9]  Yu-Chee Tseng,et al.  A trip-based multicasting model for wormhole-routed networks with virtual channels , 1993, [1993] Proceedings Seventh International Parallel Processing Symposium.

[10]  Lionel M. Ni,et al.  Performance evaluation of some MPI implementations on workstation clusters , 1994, Proceedings Scalable Parallel Libraries Conference.

[11]  Thorsten von Eicken,et al.  U-Net: a user-level network interface for parallel and distributed computing , 1995, SOSP.

[12]  Anthony Skjellum,et al.  A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard , 1996, Parallel Comput..

[13]  Jehoshua Bruck,et al.  Efficient message passing interface (MPI) for parallel computing on clusters of workstations , 1995, SPAA '95.

[14]  A. Chien,et al.  High Performance Messaging on Workstations: Illinois Fast Messages (FM) for Myrinet , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[15]  Jehoshua Bruck,et al.  CCL: a portable and tunable collective communication library for scalable parallel computers , 1994, Proceedings of 8th International Parallel Processing Symposium.

[16]  Hong Xu,et al.  Unicast-Based Multicast Communication in Wormhole-Routed Networks , 1994, IEEE Trans. Parallel Distributed Syst..

[17]  Sudhakar Yalamanchili,et al.  Interconnection Networks: An Engineering Approach , 2002 .

[18]  Philip K. McKinley,et al.  Collective Communication in Wormhole-Routed Massively Parallel Computers , 1995, Computer.

[19]  Lionel M. Ni,et al.  Construction of optimal multicast trees based on the parameterized communication model , 1996, Proceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing.

[20]  Dhabaleswar K. Panda,et al.  Minimizing node contention in multiple multicast on wormhole k-ary n-cube networks , 1996, Proceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing.

[21]  David E. Culler,et al.  A case for NOW (networks of workstation) , 1995, PODC '95.

[22]  Philip K. McKinley,et al.  A thread-based interface for collective communication on ATM networks , 1995, Proceedings of 15th International Conference on Distributed Computing Systems.

[23]  Message Passing Interface Forum MPI: A message - passing interface standard , 1994 .

[24]  Huai Zhang,et al.  Performance evaluation of some MPI implementations on workstation clusters , 2000, Proceedings Fourth International Conference/Exhibition on High Performance Computing in the Asia-Pacific Region.

[25]  Dennis G. Shea,et al.  The SP2 High-Performance Switch , 1995, IBM Syst. J..

[26]  Hong Xu,et al.  Efficient implementation of barrier synchronization in wormhole-routed hypercube multicomputers , 1992, [1992] Proceedings of the 12th International Conference on Distributed Computing Systems.

[27]  David H. C. Du,et al.  Distributed network computing over local ATM networks , 1994, Proceedings of Supercomputing '94.

[28]  Vaidy S. Sunderam,et al.  PVM: A Framework for Parallel Distributed Computing , 1990, Concurr. Pract. Exp..

[29]  Charles L. Seitz,et al.  Myrinet: A Gigabit-per-Second Local Area Network , 1995, IEEE Micro.

[30]  Bruce Lowekamp,et al.  ECO: Efficient Collective Operations for communication on heterogeneous networks , 1996, Proceedings of International Conference on Parallel Processing.

[31]  Ming-Yang Kao,et al.  Optimal Broadcast in All-Port Wormhole-Routed Hypercubes , 1994, ICPP.

[32]  Dhabaleswar K. Panda,et al.  Communication modeling of heterogeneous networks of workstations for performance characterization of collective operations , 1999, Proceedings. Eighth Heterogeneous Computing Workshop (HCW'99).

[33]  Kees Verstoep,et al.  Efficient reliable multicast on Myrinet , 1996, Proceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing.

[34]  David A. Patterson,et al.  A case for networks of workstations (now) , 1994, Symposium Record Hot Interconnects II.

[35]  Mario Lauria High Performance MPI Implementation On A Network Of Workstations , 1996 .

[36]  J. Watts,et al.  Interprocessor collective communication library (InterCom) , 1994, Proceedings of IEEE Scalable High Performance Computing Conference.

[37]  Dhabaleswar K. Panda,et al.  Multicast on irregular switch-based networks with wormhole routing , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.