An efficient MPI_allgather for grids

Allgather is an important MPI collective communication. Most of the algorithms for allgather have been designed for homogeneous and tightly coupled systems. The existing algorithms for allgather on Gridsystems do not efficiently utilize the bandwidths available on slow wide-area links of the grid. In this paper, we present an algorithm for allgather on grids that efficiently utilizes wide-area bandwidths and is also wide-area optimal. Our algorithm is also adaptive to gridload dynamics since it considers transient network characteristics for dividing the nodes into clusters. Our experiments on a real-grid setup consisting of 3 sites show that our algorithm gives an average performance improvement of 52% over existing strategies.

[1]  Sathish S. Vadhiyar,et al.  Application-oriented adaptive MPI/spl I.bar/Bcast for grids , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[2]  Eli Upfal,et al.  Efficient Algorithms for All-to-All Communications in Multiport Message-Passing Systems , 1997, IEEE Trans. Parallel Distributed Syst..

[3]  Kees Verstoep,et al.  Network performance-aware collective communication for clustered wide-area systems , 2001, Parallel Comput..

[4]  Bronis R. de Supinski,et al.  Exploiting hierarchy in parallel computer networks to optimize collective operation performance , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.

[5]  Robert A. van de Geijn,et al.  Collective communication on architectures that support simultaneous communication over multiple links , 2006, PPoPP '06.

[6]  Thomas R. Gross,et al.  Discovery and application of network information , 2000 .

[7]  Gabriel Mateescu A method for MPI broadcast in computational grids , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[8]  Ramesh Subramonian,et al.  LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[9]  Qing Huang,et al.  A Comparison of MPICH Allgather Algorithms on Switched Networks , 2003, PVM/MPI.

[10]  Luiz Angelo Steffenel,et al.  Identifying Logical Homogeneous Clusters for Efficient Wide-Area Communications , 2004, PVM/MPI.

[11]  Tsan-sheng Hsu,et al.  Scheduling Problems in a Practical Allocation Model , 1997, J. Comb. Optim..

[12]  Xin Yuan,et al.  Automatic generation and tuning of MPI collective communication routines , 2005, ICS '05.

[13]  Rajeev Thakur,et al.  Optimization of Collective Communication Operations in MPICH , 2005, Int. J. High Perform. Comput. Appl..

[14]  Henri E. Bal,et al.  MagPIe: MPI's collective communication operations for clustered wide area systems , 1999, PPoPP '99.

[15]  Henri Casanova,et al.  Network modeling issues for grid application scheduling , 2005, Int. J. Found. Comput. Sci..

[16]  Kees Verstoep,et al.  Fast Measurement of LogP Parameters for Message Passing Platforms , 2000, IPDPS Workshops.

[17]  Oh-Young Kwon,et al.  An Efficient Collective Communication Method for Grid Scale Networks , 2003, International Conference on Computational Science.

[18]  Richard Wolski,et al.  Building Performance Topologies for Computational Grids , 2004, Int. J. High Perform. Comput. Appl..

[19]  BruckJehoshua,et al.  Efficient Algorithms for All-to-All Communications in Multiport Message-Passing Systems , 1997 .

[20]  Hideo Saito,et al.  Collective operations for wide-area message-passing systems using adaptive spanning trees , 2005, Int. J. High Perform. Comput. Netw..

[21]  Yves Robert,et al.  A realistic model and an efficient heuristic for scheduling with heterogeneous processors , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.