Effective communication coalescing for data-parallel applications

Communication coalescing is a static optimization that can reduce both communication frequency and redundant data transfer in compiler-generated code for regular, data parallel applications. We present an algorithm for coalescing communication that arises when generating code for regular, data-parallel applications written in High Performance Fortran (HPF). To handle sophisticated computation partitionings, our algorithm normalizes communication before attempting coalescing. We experimentally evaluate our algorithm, which is implemented in the dHPF compiler, in the compilation of HPF versions of the NAS application benchmarks SP, BT and LU. Our normalized coalescing algorithm improves the performance and scalability of compiler-generated code for these benchmarks by reducing the communication volume up to 55% compared to a simpler coalescing strategy and enables us to match the communication volume and frequency in hand-optimized MPI implementations of these codes.

[1]  Katherine Yelick,et al.  UPC Language Specifications V1.1.1 , 2003 .

[2]  John M. Mellor-Crummey,et al.  Toward Compiler Support for Scalable Parallelism Using Multipartitioning , 2000, LCR.

[3]  Rice UniversityCORPORATE,et al.  High performance Fortran language specification , 1993 .

[4]  Ken Kennedy,et al.  Advanced optimization strategies in the Rice dHPF compiler , 2002, Concurr. Comput. Pract. Exp..

[5]  Vikram S. Adve,et al.  Using integer sets for data-parallel program analysis and optimization , 1998, PLDI.

[6]  Michael Frumkin,et al.  Implementation of NAS Parallel Benchmarks in High Performance Fortran , 2000 .

[7]  Robert J. Fowler,et al.  Generalized multipartitioning of multi-dimensional arrays for parallelizing line-sweep computations , 2003, J. Parallel Distributed Comput..

[8]  Edith Schonberg,et al.  An HPF Compiler for the IBM SP2 , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[9]  John R. Gilbert,et al.  Automatic array alignment in data-parallel programs , 1993, POPL '93.

[10]  Daniel Chavarría-Miranda,et al.  Advanced data-parallel compilation , 2004 .

[11]  Anne Rogers,et al.  Compiling for Distributed Memory Architectures , 1994, IEEE Trans. Parallel Distributed Syst..

[12]  Naoki Sueyasu,et al.  A Comparison of HPF and VPP Fortran : How it has been used in the Design and Implementation of HPF / JA Extensions , 2022 .

[13]  Edith Schonberg,et al.  A Unified Framework for Optimizing Communication in Data-Parallel Programs , 1996, IEEE Trans. Parallel Distributed Syst..

[14]  Katherine A. Yelick,et al.  Titanium: A High-performance Java Dialect , 1998, Concurr. Pract. Exp..

[15]  Jong-Deok Choi,et al.  Global communication analysis and optimization , 1996, PLDI '96.

[16]  John M. Mellor-Crummey,et al.  Data-Parallel Compiler Support for Multipartitioning , 2001, Euro-Par.

[17]  Franck Delaplace,et al.  Automatic Vectorization of Communications for Data-Parallel Programs , 1995, Euro-Par.

[18]  John M. Mellor-Crummey,et al.  An evaluation of data-parallel compiler support for line-sweep applications , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.

[19]  Mahmut T. Kandemir,et al.  A global communication optimization technique based on data-flow analysis and linear algebra , 1999, TOPL.

[20]  William Pugh,et al.  The Omega Library interface guide , 1995 .

[21]  Vijay K. Naik,et al.  Parallelization of a Class of Implicit Finite Difference Schemes in Computational Fluid Dynamics , 1993, Int. J. High Speed Comput..

[22]  M. Wegman,et al.  Global value numbers and redundant computations , 1988, POPL '88.

[23]  Robert W. Numrich,et al.  Co-array Fortran for parallel programming , 1998, FORF.

[24]  J. Mellor-Crummey,et al.  A multi-platform co-array Fortran compiler , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..