Optimizing the steady-state throughput of scatter and reduce operations on heterogeneous platforms

In this paper, we consider the communications involved by the execution of a complex application, deployed on a heterogeneous large-scale distributed platform. Such applications intensively use collective macro-communication schemes, such as scatters, personalized all-to-alls or gather/reduce operations. Rather than aiming at minimizing the execution time of a single macro-communication, we focus on the steady-state operation. We assume that there is a large number of macro-communications to perform in pipeline fashion, and we aim at maximizing the throughput, i.e., the (rational) number of macro-communications which can be initiated every time-step. We target heterogeneous platforms, modeled by a graph where resources have different communication and computation speeds. The situation is simpler for series of scatters or personalized all-to-alls than for series of reduces operations, because of the possibility of combining various partial reductions of the local values, and of interleaving computations with communications. In all cases, we show how to determine the optimal throughput, and how to exhibit a concrete periodic schedule that achieves this throughput.

[1]  Jeffrey B. Sidney,et al.  Scheduling in broadcast networks , 1998 .

[2]  Ramesh Subramonian,et al.  LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[3]  Pangfeng Liu,et al.  Broadcast scheduling optimization for heterogeneous cluster systems , 2000, SPAA '00.

[4]  Alexander Schrijver,et al.  Combinatorial optimization. Polyhedra and efficiency. , 2003 .

[5]  Bruce Lowekamp,et al.  ECO: Efficient Collective Operations for communication on heterogeneous networks , 1996, Proceedings of International Conference on Parallel Processing.

[6]  Henri E. Bal,et al.  MagPIe: MPI's collective communication operations for clustered wide area systems , 1999, PPoPP '99.

[7]  Da-Wei Wang,et al.  Reduction optimization in heterogeneous cluster environments , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.

[8]  Yves Robert,et al.  Pipelining broadcasts on heterogeneous platforms , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[9]  Jack Dongarra,et al.  MPI: The Complete Reference , 1996 .

[10]  Dhabaleswar K. Panda,et al.  Communication modeling of heterogeneous networks of workstations for performance characterization of collective operations , 1999, Proceedings. Eighth Heterogeneous Computing Workshop (HCW'99).

[11]  Paul D. Gader,et al.  Image algebra techniques for parallel image processing , 1987 .

[12]  Kenneth L. Calvert,et al.  Modeling Internet topology , 1997, IEEE Commun. Mag..

[13]  Susumu Shibusawa,et al.  Scheduling algorithms for efficient gather operations in distributed heterogeneous systems , 2000, Proceedings 2000. International Workshop on Parallel Processing.

[14]  David Gamarnik,et al.  Asymptotically Optimal Algorithms for Job Shop Scheduling and Packet Routing , 1999, J. Algorithms.

[15]  Fukuhito Ooshita,et al.  Efficient gather operation in heterogeneous cluster systems , 2002, Proceedings 16th Annual International Symposium on High Performance Computing Systems and Applications.

[16]  John H. Reif,et al.  Synthesis of Parallel Algorithms , 1993 .

[17]  Burkhard Monien,et al.  International Parallel and Distributed Processing Symposium (IPDPS 2004) , 2006 .

[18]  Henri Casanova,et al.  Simgrid: a toolkit for the simulation of application scheduling , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[19]  Yves Robert,et al.  Assessing the impact and limits of steady-state scheduling for mixed task and data parallelism on heterogeneous platforms , 2004, Third International Symposium on Parallel and Distributed Computing/Third International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Networks.

[20]  Henri Casanova,et al.  Scheduling distributed applications: the SimGrid simulation framework , 2003, CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings..

[21]  Viktor K. Prasanna,et al.  Efficient collective communication in distributed heterogeneous systems , 1999, Proceedings. 19th IEEE International Conference on Distributed Computing Systems (Cat. No.99CB37003).

[22]  Robert A. van de Geijn,et al.  Global Combine Algorithms for 2-D Meshes with Wormhole Routing , 1995, J. Parallel Distributed Comput..

[23]  Yves Robert,et al.  Optimizing the steady-state throughput of scatter and reduce operations on heterogeneous platforms , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[24]  Dhabaleswar K. Panda,et al.  Efficient collective communication on heterogeneous networks of workstations , 1998, Proceedings. 1998 International Conference on Parallel Processing (Cat. No.98EX205).

[25]  Arjen K. Lenstra,et al.  A World Wide Number Field Sieve Factoring Record: On to 512 Bits , 1996, ASIACRYPT.

[26]  Robert A. van de Geijn,et al.  A Pipelined Broadcast for Multidimensional Meshes , 1995, Parallel Process. Lett..

[27]  Anthony Skjellum,et al.  A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard , 1996, Parallel Comput..

[28]  Kees Verstoep,et al.  Network performance-aware collective communication for clustered wide-area systems , 2001, Parallel Comput..

[29]  Ian T. Foster,et al.  MPICH-G2: A Grid-enabled implementation of the Message Passing Interface , 2002, J. Parallel Distributed Comput..

[30]  Viktor K. Prasanna,et al.  Adaptive Communication Algorithms for Distributed Heterogeneous Systems , 1999, J. Parallel Distributed Comput..

[31]  Pangfeng Liu Broadcast Scheduling Optimization for Heterogeneous Cluster Systems , 2002, J. Algorithms.

[32]  Ran Libeskind-Hadas,et al.  On Multicast Algorithms for Heterogeneous Networks of Workstations , 1989, J. Parallel Distributed Comput..