The impact of memory organization on the performance of matrix multiplication

Matrix multiplication may be considered as a model problem for analyzing the performance of more complex algorithms. On Cray and IBM computer systems, there are library routines which for this task operate at high megaflop rates. Other programs from numerical linear algebra do not always achieve this level of sophistication; e.g., they suffer from performance degradation caused by memory access conflicts. This effect has been studied considering the performance of subroutines for matrix multiplication on Cray X-MP, Cray Y-MP, and IBM 3090. Results are analyzed by means of simulation. It is shown that, on a Cray, a degradation of performance by bank conflicts may be reduced if the stride of references to memory is odd. It is demonstrated that a more elaborate approach is required for the IBM 3090 computer system with a more complex storage hierarchy.<<ETX>>

[1]  Jack J. Dongarra,et al.  Squeezing the most out of an algorithm in CRAY FORTRAN , 1984, ACM Trans. Math. Softw..

[2]  F. Gustavson,et al.  Implementing Linear Algebra Algorithms for Dense Matrices on a Vector Pipeline Machine , 1984 .

[3]  Hikaru Samukawa Programming Style on the IBM 3090 Vector Facility Considering Both Performance and Flexibility , 1988, IBM Syst. J..

[4]  Helmut Weberpals Architectural approach to the IBM 3090E vector performance , 1990, Parallel Comput..

[5]  Kai Hwang,et al.  Computer architecture and parallel processing , 1984, McGraw-Hill Series in computer organization and architecture.

[6]  Jack J. Dongarra,et al.  An extended set of FORTRAN basic linear algebra subprograms , 1988, TOMS.

[7]  Jürgen-Friedrich Hake,et al.  The impact of memory organization on the performance of matrix calculations , 1991, Parallel Comput..

[8]  Bowen Liu,et al.  Programming in VS Fortran on the IBM 3090 for maximum vector performance , 1988, Computer.

[9]  S. G. Tucker,et al.  The IBM 3090 System: An Overview , 1986, IBM Syst. J..

[10]  Douglas J. Theis Vector supercomputers , 1974, Computer.

[11]  Ronald S. Clark,et al.  Vector System Performance of the IBM 3090 , 1986, IBM Syst. J..

[12]  G. TuckerS. The IBM 3090 system , 1986 .

[13]  Kimming So,et al.  Program Locality of Vectorized Applications Running on the IBM 3090 with Vector Facility , 1988, IBM Syst. J..

[14]  Charles L. Lawson,et al.  Basic Linear Algebra Subprograms for Fortran Usage , 1979, TOMS.