Computations with symmetric, positive definite and band matrices on a parallel vector processor

Abstract Computations involving symmetric, positive definite and band matrices are kernel operations in the numerical treatment of many models arising in science and engineering. It is desirable to achieve a high level of performance when such operations are to be carried out on a vector processor. If the operations are performed by rows or columns (as in the EXTENDED BLAS subroutines), then the loops are vectorized but the speed of computations, measured in Mflops, is not very high, because the arrays involved are normally short. Therefore the computations should be organized by diagonals. Furthermore, some special devices are to be applied in order to unrol the loops. Finally, one should be careful with the storage scheme. It is demonstrated that if (i) the computations are organized by diagonals , (ii) the main loops are unrolled and (iii) the storage scheme is such that the work with some zero-elements is avoided , then the speed of computations is nearly the same as that obtained in the computations with dense matrices. If a particular vector machine is in use (in our case a CRAY X-MP computer), then the speed can be increased further by (iv) coding some basic operations in machine language and (v) using the different processors of the vector computer in parallel . The efficiency of the exploitation of the special features of the particular computer that is to be used is also illustrated by numerical examples. Kernel subroutines performing matrix-vector multiplications are described. Representative tests are used to demonstrate the efficiency of these kernels.

[1]  Jack J. Dongarra,et al.  A proposal for an extended set of Fortran Basic Linear Algebra Subprograms , 1985, SGNM.

[2]  Z. Zlatev On Some Pivotal Strategies in Gaussian Elimination by Sparse Technique , 1980 .

[3]  Zahari Zlatev,et al.  Application of predictor-corrector schemes with several correctors in solving air pollution problems , 1984 .

[4]  Niel K. Madsen,et al.  Matrix Multiplication by Diagonals on a Vector/Parallel Processor , 1976, Inf. Process. Lett..

[5]  Z. Zlatev,et al.  Exploiting the separability in the solution of systems of linear ordinary differential equations , 1989 .

[6]  Zahari Zlatev,et al.  Mathematical model for studying the sulphur pollution over Europe , 1985 .

[7]  Zahari Zlatev,et al.  Treatment of some mathematical models describing long-range transport of air pollutants on vector processors , 1988, Parallel Comput..

[8]  Y. Saad,et al.  GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems , 1986 .

[9]  Charles L. Lawson,et al.  Basic Linear Algebra Subprograms for Fortran Usage , 1979, TOMS.

[10]  Jack J. Dongarra Performance of various computers using standard linear equations software in a Fortran environment , 1983, CARN.

[11]  Jack J. Dongarra,et al.  Squeezing the most out of an algorithm in CRAY FORTRAN , 1984, ACM Trans. Math. Softw..

[12]  John R. Rice,et al.  Solving elliptic problems using ELLPACK , 1985, Springer series in computational mathematics.

[13]  J. Du Croz,et al.  Performance of a subroutine library on vector-processing machines , 1985 .

[14]  Z. Zlatev,et al.  SOLVING LARGE AND SPARSE LINEAR LEAST-SQUARES PROBLEMS BY CONJUGATE GRADIENT ALGORITHMS , 1988 .

[15]  O. Østerby,et al.  Direct Methods for Sparse Matrices , 1983 .

[16]  F. Gustavson,et al.  Implementing Linear Algebra Algorithms for Dense Matrices on a Vector Pipeline Machine , 1984 .

[17]  Some methods for separating stiff components in initial value problems , 1984 .

[18]  Zahari Zlatev,et al.  Implementation of a variable stepsize variable formula method in the time-integration part of a code for treatment of long-range transport of air pollutants , 1984 .

[19]  Zahari Zlatev,et al.  Three-dimensional advection-diffusion modelling for regional scale , 1983 .

[20]  Åke Björck,et al.  A blockQR algorithm for partitioning stiff differential systems , 1983 .