Optimizing Sparse Matrix Vector Multiplication on SMP

We describe optimizations of sparse matrix-vector multiplication on uniprocessors and SMPs. The optimization techniques include register blocking, cache blocking, and matrix reordering. We focus on optimizations that improve performance on SMPs, in particular, matrix reordering implemented using two diierent graph algorithms. We present a performance study of this algorithmic kernel, showing how the optimization techniques aaect absolute performance and scalability, how they interact with one another, and how the performance beneets depend on matrix structure.