Message Strip-Mining Heuristics for High Speed Networks

In this work we investigate how the compiler technique of message strip-mining performs in practice on contemporary high performance networks. Message strip-mining attempts to reduce the overall cost of communication in parallel programs by breaking up large message transfers into smaller ones that can be overlapped with computation. In practice, however, network resource constraints may negate the expected performance gains. By deriving a performance model and synthetic benchmarks we determine how network and application characteristics in.uence the applicability of this optimization. We use these .ndings to determine heuristics to follow when performing this optimization on parallel programs. We propose strip-mining with variable block size as an alternative strategy that performs almost as well as a highly tuned .xed block strategy and has the advantage of being performance portable across systems and application input sets. We evaluate both techniques using synthetic benchmarks and an application from the NAS Parallel Benchmark suite.

[1]  F. F. Rivera,et al.  Comparing Vectorization Techniques for Triangular Matrix Decomposition Computations , 1995 .

[2]  Ken Kennedy,et al.  Compiling Fortran D for MIMD distributed-memory machines , 1992, CACM.

[3]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[4]  Ramesh Subramonian,et al.  LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[5]  Michael Wolfe,et al.  A New Approach to Array Redistribution: Strip Mining Redistribution , 1994, PARLE.

[6]  Jason Duell,et al.  An evaluation of current high-performance networks , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[7]  Geoffrey C. Fox,et al.  Fortran 90D/HPF compiler for distributed memory MIMD computers: design, implementation, and performance results , 1993, Supercomputing '93.

[8]  Michael Wolfe,et al.  Eeectiveness of Message Strip-mining for Regular and Irregular Communication , 1994 .

[9]  Paul D. Gader,et al.  Image algebra techniques for parallel image processing , 1987 .

[10]  Geoffrey C. Fox,et al.  A Compilation Approach for Fortran 90D/HPF Compilers on Distributed Memory MIMD Computers , 1993 .

[11]  Hans P. Zima,et al.  Compiling for distributed-memory systems , 1993 .

[12]  Chris J. Scheiman,et al.  LogGP: Incorporating Long Messages into the LogP Model for Parallel Computation , 1997, J. Parallel Distributed Comput..

[13]  Michael F. P. O'Boyle A Data Partitioning Algorithm for Distributed Memory Compilation , 1994, PARLE.

[14]  Bernard Tourancheau,et al.  The Design for a High-Performance MPI Implementation on the Myrinet Network , 1999, PVM/MPI.

[15]  Edith Schonberg,et al.  An HPF Compiler for the IBM SP2 , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[16]  Francisco Tirado,et al.  Data Locality Exploitation in the Decomposition of Regular Domain Problems , 2000, IEEE Trans. Parallel Distributed Syst..

[17]  Dan Bonachea GASNet Specification, v1.1 , 2002 .

[18]  Manish Gupta,et al.  Compile-time estimation of communication costs on multicomputers , 1992, Proceedings Sixth International Parallel Processing Symposium.