Performance Analysis of Built-in Parallel Reduction's Implementation in OpenMP C/C++ Language Extension

Parallel reduction algorithms are frequent in high performance computing areas, thus, modern parallel programming toolkits and languages often offer support for these algorithms. This article discusses important implementation aspects of built-in support for parallel reduction found in well-known OpenMP C/C++ language extension. It shows that the implementation in widely used GCC compiler is not efficient and suggests usage of custom reduction implementation improving the computational performance.

[1]  Hesham El-Rewini,et al.  Fundamentals of computer organization and architecture , 2004, Wiley series on parallel and distributed computing.

[2]  Barry Wilkinson,et al.  Parallel programming , 1998 .

[3]  Michael J. Quinn,et al.  Parallel programming in C with MPI and OpenMP , 2003 .

[4]  Rohit Chandra,et al.  Parallel programming in openMP , 2000 .