A PARALLEL VARIANT OF GMRES(m)

In the usual implementation of GMRES(m) [3} the computationally most expensive part is the Modified GramSchmidt process (MGS). It is obvious, that the MGS process is not well parallelizable on distributed memory multiprocessors, since tbe inner products act as synchronization points and thus require communication that cannot be overlapped. Furthermore, as all orthogonalizations must be done sequentially, MGS generates a large number of short messages, which is relatively expensive. Especially on large processor grids the time spent in communication in the MGS process may be significant. For this reason a variant of the usual GMRES(m) algorithm is considered, called modGMRES(m), which first generates the vectors that span the Krylov space and then combines the MGS steps for a group of vectors. It is shown, on real world problems, that the modGMRES(m) method can yield a considerable gain in time per iteration. Numerical experience suggests that ,the total number of iterations remains about the same as for