论文信息 - Block algorithms for sparse matrix computations on high performance workstations

Block algorithms for sparse matrix computations on high performance workstations

In this paper we analyze the use of Blocklng (tiling), Data Precopying and Software Pipelining to improve the performance of sparse matrix computations on superscalar workstations. In particular, we analyze the case of the Sparse Matrix by dense Matrix operation. The analysis focusses on the practical aspects that can be observed when programming such problem on present workstations with several memory levels. The problem is studied on the Alpha 21064 based workstation DEC 3000/800. Simulations of the memory hierarchy are also used to understand the behaviour of the algorithms. The results obtained show that there is a clear difference between the dense case and the sparse case in terms of the compromises to be adopted to optimize the algorithms. The analysis can be of interest to numerical library and compiler designers.

[1] Ken Kennedy,et al. Automatic loop interchange , 2004, SIGP.

[2] Ken Kennedy,et al. Improving register allocation for subscripted variables , 1990, PLDI '90.

[3] Alfred V. Aho,et al. Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[4] Sergio Pissanetzky,et al. Sparse Matrix Technology , 1984 .

[5] Walid Abu-Sufah,et al. Improving the performance of virtual memory computers. , 1979 .

[6] Alexander Aiken,et al. Optimal loop parallelization , 1988, PLDI '88.

[7] R. L. Stewart,et al. The Design of the DEC 3000 AXP Systems, Two High-performance Workstations , 1992, Digit. Tech. J..

[8] Tomás Lang,et al. MOB forms: a class of multilevel block algorithms for dense linear algebra operations , 1994, ICS '94.

[9] F. Gustavson,et al. Implementing Linear Algebra Algorithms for Dense Matrices on a Vector Pipeline Machine , 1984 .

[10] Olivier Temam,et al. Characterizing the behavior of sparse algorithms on caches , 1992, Proceedings Supercomputing '92.

[11] Monica S. Lam,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.

[12] Jocelyne Erhel. Sparse Matrix Multiplication on Vector Computers , 1990, Int. J. High Speed Comput..

[13] K. A. Gallivan,et al. Parallel Algorithms for Dense Linear Algebra Computations , 1990, SIAM Rev..

[14] Jack J. Dongarra,et al. Unrolling loops in fortran , 1979, Softw. Pract. Exp..