OpenMP is one of the most widely used parallel programming techniques in modern multi-core era. Parallelizing a loop using OpenMP is just as simple as adding a few directive sentences. However, for its simplicity, it is not rare that programmers excessively use OpenMP to parallelize loops in various applications which introduce too much overhead and lead to performance degradation. This paper establishes a performance model for OpenMP parallelized loops to address the critical factors which influence the performance. The model is validated through experiments on three different multi-core platforms. The results shows that best performance can be obtained when number of threads used in OpenMP applications equals to the number of cores that available on the platform. And parallelizing the outmost loop in nested loops can get higher speedup.
[1]
Rosni Abdullah,et al.
A Survey on Performance Tools for OpenMP
,
2009
.
[2]
Alejandro Duran,et al.
The Design of OpenMP Tasks
,
2009,
IEEE Transactions on Parallel and Distributed Systems.
[3]
B. Mohr,et al.
Parallel Programming Models, Tools and Performance Analysis
,
2000
.
[4]
Félix C. García López,et al.
An efficient synchronization model for OpenMP
,
2006,
J. Parallel Distributed Comput..
[5]
Xavier Martorell,et al.
OpenMP Performance Analysis Approach in the INTONE Project
,
2001
.
[6]
David W. Walker,et al.
Performance analysis of a hybrid MPI/OpenMP application on multi-core clusters
,
2010,
J. Comput. Sci..