Mixed Mode Applications on HPCx

Five different OpenMP implementations of a simple Jacobi algorithm have been developed and their performance compared and contrasted. Comparing the best of these codes with an equivalent pure MPI implementation shows that the OpenMP code is considerably faster, due mainly to the use of direct read and writes to memory. However an equivalent mixed OpenMP / MPI version of the code shows poorer performance than the pure MPI code. While the collective operations are faster, partly due to the less processes being involved in the MPI call, all of the application sections are slower. This is due to the inclusion of the threads shared memory communications and also due to cache problems. In addition, the mixed point-to-point communications are slower than the pure MPI, due to threads having to re-cache data after MPI calls and from communication traffic between nodes becoming more dominant.