Optimizing sparse matrix-vector multiplication using index and value compression

Previous research work has identified memory bandwidth as the main bottleneck of the ubiquitous Sparse Matrix-Vector Multiplication kernel. To attack this problem, we aim at reducing the overall data volume of the algorithm. Typical sparse matrix representation schemes store only the non-zero elements of the matrix and employ additional indexing information to properly iterate over these elements. In this paper we propose two distinct compression methods targeting index and numerical values respectively. We perform a set of experiments on a large real-world matrix set and demonstrate that the index compression method can be applied successfully to a wide range of matrices. Moreover, the value compression method is able to achieve impressive speedups in a more limited yet important class of sparse matrices that contain a small number of distinct values

[1]  Youcef Saad,et al.  A Basic Tool Kit for Sparse Matrix Computations , 1990 .

[2]  John M. Mellor-Crummey,et al.  Optimizing Sparse Matrix–Vector Product Computations Using Unroll and Jam , 2004, Int. J. High Perform. Comput. Appl..

[3]  Francisco F. Rivera,et al.  Improving the locality of the sparse matrix-vector product on shared memory multiprocessors , 2004, 12th Euromicro Conference on Parallel, Distributed and Network-Based Processing, 2004. Proceedings..

[4]  A. Pinar,et al.  Improving Performance of Sparse Matrix-Vector Multiplication , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[5]  E. Im,et al.  Optimizing Sparse Matrix Vector Multiplication on SMP , 1999, PPSC.

[6]  Katherine Yelick,et al.  Performance models for evaluation and automatic tuning of symmetric sparse matrix-vector multiply , 2004 .

[7]  Katherine A. Yelick,et al.  Optimizing Sparse Matrix Computations for Register Reuse in SPARSITY , 2001, International Conference on Computational Science.

[8]  James Demmel,et al.  Performance Optimizations and Bounds for Sparse Matrix-Vector Multiply , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[9]  Eun Im,et al.  Optimizing the Performance of Sparse Matrix-Vector Multiplication , 2000 .

[10]  David E. Keyes,et al.  Four Horizons for Enhancing the Performance of Parallel Simulations Based on Partial Differential Equations , 2000, Euro-Par.

[11]  Martin Burtscher,et al.  High Throughput Compression of Double-Precision Floating-Point Data , 2007, 2007 Data Compression Conference (DCC'07).

[12]  Nectarios Koziris,et al.  Understanding the Performance of Sparse Matrix-Vector Multiplication , 2008, 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008).

[13]  Sivan Toledo,et al.  Improving the memory-system performance of sparse-matrix vector multiplication , 1997, IBM J. Res. Dev..

[14]  Richard Barrett,et al.  Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods , 1994, Other Titles in Applied Mathematics.

[15]  Malik Silva,et al.  Sparse matrix storage revisited , 2005, CF '05.

[16]  Michael T. Heath,et al.  Improving Performance of Sparse Matrix-Vector Multiplication , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[17]  W. K. Anderson,et al.  Achieving High Sustained Performance in an Unstructured Mesh CFD Application , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[18]  Hyun Jin Moon,et al.  Fast Sparse Matrix-Vector Multiplication by Exploiting Variable Block Structure , 2005, HPCC.

[19]  Francisco F. Rivera,et al.  Performance optimization of irregular codes based on the combination of reordering and blocking techniques , 2005, Parallel Comput..

[20]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[21]  Andrew Lumsdaine,et al.  Accelerating sparse matrix computations via data compression , 2006, ICS '06.

[22]  Samuel Williams,et al.  Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2009, Parallel Comput..

[23]  J. Dongarra,et al.  Exploiting the Performance of 32 bit Floating Point Arithmetic in Obtaining 64 bit Accuracy (Revisiting Iterative Refinement for Linear Systems) , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[24]  Katherine A. Yelick,et al.  Optimizing Sparse Matrix Vector Multiplication on SMP , 1999, SIAM Conference on Parallel Processing for Scientific Computing.