Experiences using collective communication in a parallel cfd industrial code

Publisher Summary Communication primitives for message-passing parallel computing may be classified as either point to point, involving a single source and a single destination; or collective, involving more than two processes. Reduce is, in some sense, the inverse problem of broadcast and therefore, it can be given an optimal solution by reversing the direction of messages in optimal broadcast. The bcast algorithm cannot be immediately used to implement a reduction operation because the latter requires a prior knowledge of the role that each node plays in the collective operation. Local communication occurs whenever a given processor needs to exchange data with its neighbors across a common interprocessor boundary. Global communication is needed at completion of each iteration cycle when each processor has to forward its residual to a master processor to perform a global convergence check. Global communication is also required within the linear solver to orthogonalize the residual vectors during convergence acceleration step.