Many scientific applications involve array operations that are sparse in nature, ie array elements depend on the values of relatively few elements of the same or another array. When parallelised in the shared-memory model, there are often inter-thread dependencies which require that the individual array updates are protected in some way. Possible strategies include protecting all the updates, or having each thread compute local temporary results which are then combined globally across threads. However, for the extremely common situation of sparse array access, neither of these approaches is particularly efficient. The key point is that data access patterns usually remain constant for a long time, so it is possible to use an inspector/executor approach. When the sparse operation is first encountered, the access pattern is inspected to identify those updates which have potential inter-thread dependencies. Whenever the code is actually executed, only these selected updates are protected. We propose a new OpenMP clause, {\tt indirect}, for parallel loops that have irregular data access patterns. This is trivial to implement in a conforming way by protecting every array update, but also allows for an inspector/executor compiler implementation which will be more efficient in sparse cases. We describe efficient compiler implementation strategies for the new directive. We also present timings from the kernels of a Discrete Element Modelling application and a Finite Element code where the inspector/executor approach is used. The results demonstrate that the method can be extremely efficient in practice.
[1]
C. Choy,et al.
IEEE Transactions on Computers, Vol. 51
,
2001
.
[2]
Pen-Chung Yew,et al.
A Scheme to Enforce Data Dependence on Large Multiprocessor Systems
,
1987,
IEEE Trans. Software Eng..
[3]
Lori Pollock,et al.
Porting and performance evaluation of irregular codes using OpenMP
,
2000
.
[4]
Josep Torrellas,et al.
An efficient algorithm for the run-time parallelization of DOACROSS loops
,
1994,
Proceedings of Supercomputing '94.
[5]
Eduard Ayguadé,et al.
NanosCompiler: supporting flexible multilevel parallelism exploitation in OpenMP
,
2000
.
[6]
Constantine D. Polychronopoulos.
Compiler Optimizations for Enhancing Parallelism and Their Impact on Architecture Design
,
1988,
IEEE Trans. Computers.
[7]
Nancy M. Amato,et al.
Run-time methods for parallelizing partially parallel loops
,
1995,
ICS '95.
[8]
Viera Sipková,et al.
Parallelizing Irregular Applications with the Vienna HPF+ Compiler VFC
,
1998,
HPCN Europe.
[9]
Harry Berryman,et al.
Runtime Compilation Methods for Multicomputers
,
1991,
International Conference on Parallel Processing.
[10]
D. S. Henty,et al.
Performance of Hybrid Message-Passing and Shared-Memory Parallelism for Discrete Element Modeling
,
2000,
ACM/IEEE SC 2000 Conference (SC'00).
[11]
P. Sadayappan,et al.
An approach to synchronization for parallel computing
,
1988,
ICS '88.