Block algorithms for sparse matrix computations on high performance workstations

In this paper we analyze the use of Blocklng (tiling), Data Precopying and Software Pipelining to improve the performance of sparse matrix computations on superscalar workstations. In particular, we analyze the case of the Sparse Matrix by dense Matrix operation. The analysis focusses on the practical aspects that can be observed when programming such problem on present workstations with several memory levels. The problem is studied on the Alpha 21064 based workstation DEC 3000/800. Simulations of the memory hierarchy are also used to understand the behaviour of the algorithms. The results obtained show that there is a clear difference between the dense case and the sparse case in terms of the compromises to be adopted to optimize the algorithms. The analysis can be of interest to numerical library and compiler designers.