Efficient sparse matrix factorization on high performance workstations—exploiting the memory hierarchy

The performance of workstation-class machines has increased dramatically in the recent past. Relatively inexpensive machines offering 10-20 MIPS and 1-5 MFLOPS performance are now available, and machines with even higher performance are not far off. One important. characteristic of these machines is that they rely on a emall amount of high-speed cache memory for their increased performance. In this paper, we consider the problem of Cholesky factorization of a large sparse positive definite system of equations on a high-performance workstation. We find that the major factor limiting performance is the cost of moving data between memory and the processor. We use two techniques to address this limitation; we decrease the number of memory references and we improve cache behavior to decrease the cost of each reference. Using benchmarks from the Harwell-Boeing Sparse Matrix Collection, experimente on a DECstation 3100 show that the resulting factorization code is almost three times as fast as SPARSPAK. We believe that the issues brought up in this paper will play an important role in the effective uee of high-performance workstations on large numerical problems.