The impact of memory organization on the performance of matrix calculations

The memory organization of vector supercomputers is designed to support a high rate of data transfer between registers and main memory. Nevertheless, there are applications for which this link turns out to be a bottleneck. It can be removed using an interface to appropriate library software or programming techniques which take architectural features into account. This report deals with the impact of memory access conflicts on the execution time of matrix calculations. For this study, two variants of matrix multiplication are considered as model problems contrasting memory access with stride one and access with stride n. The CPU time consumption of the two variants is analyzed by means of simulation. It is shown, that the results are also valid for the solution of linear equations, eigenvalue problems, and shortest-path problems if the algorithms are implemented analogously. The analysis is carried out for computers with an interleaved memory (CRAY X-MP, CRAY Y-MP, FUJITSU VP) and a hierarchical memory (IBM 3090). The results are also related to library software in order to point out the benefits a user may gain from the usage of highly optimized software. Moreover, it is demonstrated that multiple processors working in parallel on a shared memory may even increase the number of memory access conflicts.

[1]  Kimming So,et al.  Program Locality of Vectorized Applications Running on the IBM 3090 with Vector Facility , 1988, IBM Syst. J..

[2]  Charles L. Lawson,et al.  Basic Linear Algebra Subprograms for Fortran Usage , 1979, TOMS.

[3]  Douglas J. Theis Vector supercomputers , 1974, Computer.

[4]  Alan Jay Smith,et al.  Cache Memories , 1982, CSUR.

[5]  Ronald S. Clark,et al.  Vector System Performance of the IBM 3090 , 1986, IBM Syst. J..

[6]  Jürgen-Friedrich Hake,et al.  Linear algebra software on a vector computer , 1989, Parallel Comput..

[7]  Jack J. Dongarra,et al.  A set of level 3 basic linear algebra subprograms , 1990, TOMS.

[8]  Bowen Liu,et al.  Programming in VS Fortran on the IBM 3090 for maximum vector performance , 1988, Computer.

[9]  Jack J. Dongarra,et al.  An extended set of FORTRAN basic linear algebra subprograms , 1988, TOMS.

[10]  J. M. Ortega,et al.  The ijk forms of factorization methods I. Vector computers , 1988, Parallel Comput..

[11]  S. G. Tucker,et al.  The IBM 3090 System: An Overview , 1986, IBM Syst. J..

[12]  Kai Hwang,et al.  Computer architecture and parallel processing , 1984, McGraw-Hill Series in computer organization and architecture.

[13]  W. Homberg,et al.  Linear algebra software on IBM and CRAY computers , 1989 .

[14]  Helmut Weberpals Architectural approach to the IBM 3090E vector performance , 1990, Parallel Comput..

[15]  Kay A. Robbins,et al.  The Cray X-MP/Model 24 , 1989, Lecture Notes in Computer Science.

[16]  Jack J. Dongarra,et al.  Squeezing the most out of an algorithm in CRAY FORTRAN , 1984, ACM Trans. Math. Softw..

[17]  Wolfgang E. Nagel,et al.  A comparison of parallel processing on CRAY X-MP AND IBM 3090 VF multiprocessors , 1989, ICS '89.

[18]  F. Gustavson,et al.  Implementing Linear Algebra Algorithms for Dense Matrices on a Vector Pipeline Machine , 1984 .

[19]  Hikaru Samukawa Programming Style on the IBM 3090 Vector Facility Considering Both Performance and Flexibility , 1988, IBM Syst. J..