Perfomance Models for Blocked Sparse Matrix-Vector Multiplication Kernels

Sparse Matrix-Vector multiplication (SpMV) is a very challenging computational kernel, since its performance depends greatly on both the input matrix and the underlying architecture. The main problem of SpMV is its high demands on memory bandwidth, which cannot yet be abudantly offered from modern commodity architectures. One of the most promising optimization techniques for SpMV is blocking, which can reduce the indexing structures for storing a sparse matrix, and therefore alleviate the pressure to the memory subsystem. However, blocking methods can severely degrade performance if not used properly. In this paper, we study and evaluate a number of representative blocking storage formats and present a performance model that can accurately select the most suitable blocking storage format and the corresponding block shape and size for a specific sparse matrix. Our model considers both the memory and computational part of the kernel, which can be non-negligible when applying blocking, and also assumes an overlapping of memory accesses and computations that modern commodity architectures can offer through hardware prefetching mechanisms.

[1]  James Demmel,et al.  Performance Optimizations and Bounds for Sparse Matrix-Vector Multiply , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[2]  Hyun Jin Moon,et al.  Fast Sparse Matrix-Vector Multiplication by Exploiting Variable Block Structure , 2005, HPCC.

[3]  Youcef Saad,et al.  A Basic Tool Kit for Sparse Matrix Computations , 1990 .

[4]  Ramesh C. Agarwal,et al.  A high performance algorithm using pre-processing for the sparse matrix-vector multiplication , 1992, Proceedings Supercomputing '92.

[5]  Nectarios Koziris,et al.  Exploring the effect of block shapes on the performance of sparse kernels , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[6]  D. Keyes,et al.  Toward Realistic Performance Bounds for Implicit CFD , 1999 .

[7]  Richard Vuduc,et al.  Automatic performance tuning of sparse matrix kernels , 2003 .

[8]  Richard Barrett,et al.  Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods , 1994, Other Titles in Applied Mathematics.

[9]  Samuel Williams,et al.  Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2009, Parallel Comput..

[10]  Nectarios Koziris,et al.  Improving the Performance of Multithreaded Sparse Matrix-Vector Multiplication Using Index and Value Compression , 2008, 2008 37th International Conference on Parallel Processing.

[11]  E. Im,et al.  Optimizing Sparse Matrix Vector Multiplication on SMP , 1999, PPSC.

[12]  Nectarios Koziris,et al.  Understanding the Performance of Sparse Matrix-Vector Multiplication , 2008, 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008).

[13]  Katherine A. Yelick,et al.  Optimizing Sparse Matrix Vector Multiplication on SMP , 1999, SIAM Conference on Parallel Processing for Scientific Computing.

[14]  Michael T. Heath,et al.  Improving Performance of Sparse Matrix-Vector Multiplication , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[15]  Andrew Lumsdaine,et al.  Accelerating sparse matrix computations via data compression , 2006, ICS '06.

[16]  Victor Eijkhout,et al.  Performance Optimization and Modeling of Blocked Sparse Kernels , 2007, Int. J. High Perform. Comput. Appl..

[17]  Katherine A. Yelick,et al.  Optimizing Sparse Matrix Computations for Register Reuse in SPARSITY , 2001, International Conference on Computational Science.

[18]  Olivier Temam,et al.  Characterizing the behavior of sparse algorithms on caches , 1992, Proceedings Supercomputing '92.

[19]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.