Matrix Multiplication by Diagonals on a Vector/Parallel Processor

The advent of vector and parallel computers has forced the reexamination, reformulation, and 9 rethinking of essentially all of the basic mathematical algorithms. In this paper, we will bonsider the seemingly straightforward process of matrix multiplication. We will be primarily concerned with this process on a vector processor, the CDC STAR-100. For large full matrices, matrix *multiplication is easily “vectorized” when the matrix is stored by columns in the typical Fortran fashion. However, there are at least two disadvantages to this normal approach. First, it becomes quite inefficient for banded matrices with relatively narrow bandwidths. Second, when a matrix is stored by columns (or rows), the transpose of the matrix is not as readily available for use in a vector form (this is particularly a problem on the STAR-100). The purpose of this paper is to present a new algorithm for matrix multiplication which: is readily “‘vectorizetl”, is very efficient for narrow banded matrices, and allows for the transpose to be easily accessible in a vector form.