Performance model for OpenMP parallelized loops

OpenMP is one of the most widely used parallel programming techniques in modern multi-core era. Parallelizing a loop using OpenMP is just as simple as adding a few directive sentences. However, for its simplicity, it is not rare that programmers excessively use OpenMP to parallelize loops in various applications which introduce too much overhead and lead to performance degradation. This paper establishes a performance model for OpenMP parallelized loops to address the critical factors which influence the performance. The model is validated through experiments on three different multi-core platforms. The results shows that best performance can be obtained when number of threads used in OpenMP applications equals to the number of cores that available on the platform. And parallelizing the outmost loop in nested loops can get higher speedup.