Analyses for the Translation of OpenMP Codes into SPMD Style with Array Privatization

A so-called SPMD style OpenMP program can achieve scalability on ccNUMA systems by means of array privatization, and earlier research has shown good performance under this approach. Since it is hard to write SPMD OpenMP code, we showed a strategy for the automatic translation of many OpenMP constructs into SPMD style in our previous work. In this paper, we first explain how to interprocedurally detect whether the OpenMP program consistently schedules the parallel loops. If the parallel loops are consistently scheduled, we may carry out array privatization according to OpenMP semantics. We give two examples of code patterns that can be handled despite the fact that they are not consistent, and where the strategy used to translate them differs from the straightforward approach that can otherwise be applied.

[1]  Ken Kennedy,et al.  A technique for summarizing data access and its use in parallelism enhancing transformations , 1989, PLDI '89.

[2]  Eduard Ayguadé,et al.  Complex pipelined executions in OpenMP parallel applications , 2001, International Conference on Parallel Processing, 2001..

[3]  Barbara M. Chapman,et al.  Performance Oriented Programming for NUMA Architectures , 2001, WOMPAT.

[4]  Alan J. Wallcraft OpenMP vs MPI for Ocean Models , 1999 .

[5]  Eduard Ayguadé,et al.  Exploiting memory affinity in OpenMP through schedule reuse , 2001, CARN.

[6]  Ken Kennedy,et al.  An Implementation of Interprocedural Bounded Regular Section Analysis , 1991, IEEE Trans. Parallel Distributed Syst..

[7]  Eduard Ayguadé,et al.  Is Data Distribution Necessary in OpenMP? , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[8]  Jingke Li,et al.  Index domain alignment: minimizing cost of cross-referencing between distributed arrays , 1990, [1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation.

[9]  K. Kennedy,et al.  Automatic Data Layout for High Performance Fortran , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[10]  Nenad Nedeljkovic,et al.  Data distribution support on distributed shared memory multiprocessors , 1997, PLDI '97.

[11]  Manish Gupta,et al.  PARADIGM: a compiler for automatic data distribution on multicomputers , 1993, ICS '93.

[12]  Ken Kennedy,et al.  Efficient call graph analysis , 1992, LOPL.

[13]  Yunheung Paek,et al.  An Advanced Compiler Framework for Non-Cache-Coherent Multiprocessors , 2002, IEEE Trans. Parallel Distributed Syst..

[14]  Barbara M. Chapman,et al.  Interprocedural Array Alignment Analysis , 1998, HPCN Europe.

[15]  Bob Francis,et al.  Silicon Graphics Inc. , 1993 .

[16]  L. Luo,et al.  Theory of the lattice Boltzmann method: From the Boltzmann equation to the lattice Boltzmann equation , 1997 .

[17]  Zhiyuan Li,et al.  Program parallelization with interprocedural analysis , 2004, The Journal of Supercomputing.

[18]  Paul Feautrier,et al.  Direct parallelization of call statements , 1986, SIGPLAN '86.

[19]  Barbara M. Chapman,et al.  Improving the Performance of OpenMP by Array Privatization , 2003, WOMPAT.

[20]  Barbara M. Chapman,et al.  Achieving performance under OpenMP on ccNUMA and software distributed shared memory systems , 2002, Concurr. Comput. Pract. Exp..

[21]  Barbara G. Ryder,et al.  Constructing the Call Graph of a Program , 1979, IEEE Transactions on Software Engineering.

[22]  Marcelo Lobosco,et al.  Java for high‐performance network‐based computing: a survey , 2002, Concurr. Comput. Pract. Exp..

[23]  Jonathan Harris,et al.  Extending OpenMP For NUMA Machines , 2000, ACM/IEEE SC 2000 Conference (SC'00).