Conjugate-Gradients Algorithms on a CRAY-T3D

Conjugate-Gradients algorithms are recognized as competitive, among the available fast iterative schemes, for the solution of large-scale linear systems. The paper presents a parallel implementation, on distributed memory architectures using the SPMD programming paradigm, of the CGS and BiCGSTAB methods associated with the most popular algebraic preconditioners. We analyze the programming environment supplied by a Cray-T3D to handle data communication and data distribution and compare the performance of two versions of our code: one based on the PVM message passing interface and the other one based on the Shared Memory Access Library. The influence of a block-cyclic partitioning on the performance of the algorithms is also investigated, focusing the attention on the incomplete LU factorization preconditioner. In particular we address the tradeoff between minimizing interprocessor communication and exploiting the available parallelism by a suitable data distribution. The numerical experiments, carried out with different matrix sizes, show that the block-cyclic distribution gives satisfactory results if the problem is large enough.