Compiler optimization techniques for OpenMP programs

We have developed compiler optimization techniques for explicit parallel programs using the OpenMP API. To enable optimization across threads, we designed dataflow analysis techniques in which interactions between threads are effectively modeled. Structured description of parallelism and relaxed memory consistency in OpenMP make the analyses effective and efficient. We developed algorithms for reaching definitions analysis, memory synchronization analysis, and cross-loop data dependence analysis for parallel loops. Our primary target is compiler-directed software distributed shared memory systems in which aggressive compiler optimizations for software-implemented coherence schemes are crucial to obtaining good performance. We also developed optimizations applicable to general OpenMP implementations, namely redundant barrier removal and privatization of dynamically allocated objects. Experimental results for the coherency optimization show that aggressive compiler optimizations are quite effective for a shared-write intensive program because the coherence-induced communication volume in such a program is much larger than that in shared-read intensive programs.

[1]  Lori Pollock,et al.  Porting and performance evaluation of irregular codes using OpenMP , 2000 .

[2]  Kourosh Gharachorloo,et al.  Fine-grain software distributed shared memory on SMP clusters , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.

[3]  Mitsuhisa Sato,et al.  COMPaS: a PC-based SMP cluster , 1999, IEEE Concurr..

[4]  Josep Torrellas,et al.  Compiler support for data forwarding in scalable shared-memory multiprocessors , 1999, Proceedings of the 1999 International Conference on Parallel Processing.

[5]  Thomas R. Gross,et al.  Transparent adaptive parallelism on NOWs using OpenMP , 1999, PPoPP '99.

[6]  Mitsuhisa Sato,et al.  Impact of OpenMP Optimizations for the MGCG Method , 2000, ISHPC.

[7]  David A. Padua,et al.  Basic compiler algorithms for parallel programs , 1999, PPoPP '99.

[8]  William Pugh,et al.  Iteration space slicing and its application to communication optimization , 1997, ICS '97.

[9]  Dirk Grunwald,et al.  Data flow equations for explicitly parallel programs , 1993, PPOPP '93.

[10]  Chau-Wen Tseng,et al.  Compiler optimizations for eliminating barrier synchronization , 1995, PPOPP '95.

[11]  James Hook,et al.  Static single assignment for explicitly parallel programs , 1993, POPL '93.

[12]  Mitsuhisa Sato,et al.  Design of OpenMP Compiler for an SMP Cluster , 1999 .

[13]  Mitsuhisa Sato,et al.  Openmp Compiler for a Software Distributed Shared Memory System Scash , 2000 .

[14]  Alan L. Cox,et al.  OpenMP for networks of SMPs , 1999, Proceedings 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing. IPPS/SPDP 1999.

[15]  Jonathan Schaeffer,et al.  Concurrent SSA form in the presence of mutual exclusion , 1998, Proceedings. 1998 International Conference on Parallel Processing (Cat. No.98EX205).

[16]  Gagan Agrawal,et al.  Porting and performance evaluation of irregular codes using OpenMP , 2000, Concurr. Pract. Exp..

[17]  James R. Larus,et al.  Sirocco: cost-effective fine-grain distributed shared memory , 1998, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192).

[18]  Tzi-cker Chiueh,et al.  A compiler-directed distributed shared memory system , 1995, ICS '95.

[19]  Jean-Francois Collard,et al.  Array SSA for Explicitly Parallel Programs , 1999, Euro-Par.

[20]  Michael Wolfe,et al.  High performance compilers for parallel computing , 1995 .

[21]  Jyh-Herng Chow,et al.  Compile-time analysis of parallel programs that share memory , 1992, POPL '92.

[22]  Bernhard Steffen,et al.  Code motion for explicitly parallel programs , 1999, PPoPP '99.

[23]  Mitsuhisa Sato,et al.  Performance Evaluation of the Omni OpenMP Compiler , 2000, ISHPC.

[24]  Katherine A. Yelick,et al.  Optimizing parallel programs with explicit synchronization , 1995, PLDI '95.

[25]  Jeanne Ferrante,et al.  Computing Communication Sets for Control Parallel Programs , 1994, LCPC.

[26]  Takashi Matsumoto,et al.  Supporting software distributed shared memory with an optimizing compiler , 1998, Proceedings. 1998 International Conference on Parallel Processing (Cat. No.98EX205).

[27]  Jens Knoop,et al.  Parallel Data-Flow Analysis of Explicitly Parallel Programs , 1999, Euro-Par.

[28]  Chau-Wen Tseng,et al.  Compile-time synchronization optimizations for software DSMs , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.