Exploiting compression opportunities to improve SpMxV performance on shared memory systems

The Sparse Matrix-Vector Multiplication (SpMxV) kernel exhibits poor scaling on shared memory systems, due to the streaming nature of its data access pattern. To decrease memory contention and improve kernel performance we propose two compression schemes: CSR-DU, that targets the reduction of the matrix structural data by applying coarse-grained delta-encoding, and CSR-VI, that targets the reduction of the values using indirect indexing, applicable to matrices with a small number of unique values. Thorough experimental evaluation of the proposed methods and their combination, on two modern shared memory systems, demonstrated that they can significantly improve multithreaded SpMxV performance upon standard and state-of-the-art approaches.

[1]  Roman Geus,et al.  Towards a fast parallel sparse matrix-vector multiplication , 2000, PARCO.

[2]  Stamatis Vassiliadis,et al.  Parallel Computer Architecture , 2000, Euro-Par.

[3]  Katherine Yelick,et al.  Performance models for evaluation and automatic tuning of symmetric sparse matrix-vector multiply , 2004 .

[4]  Francisco F. Rivera,et al.  Improving the locality of the sparse matrix-vector product on shared memory multiprocessors , 2004, 12th Euromicro Conference on Parallel, Distributed and Network-Based Processing, 2004. Proceedings..

[5]  A. Pinar,et al.  Improving Performance of Sparse Matrix-Vector Multiplication , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[6]  Olivier Temam,et al.  Characterizing the behavior of sparse algorithms on caches , 1992, Proceedings Supercomputing '92.

[7]  David A. Patterson,et al.  Computer Architecture, Fifth Edition: A Quantitative Approach , 2011 .

[8]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[9]  J. Dongarra,et al.  Exploiting the Performance of 32 bit Floating Point Arithmetic in Obtaining 64 bit Accuracy (Revisiting Iterative Refinement for Linear Systems) , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[10]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[11]  Richard Barrett,et al.  Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods , 1994, Other Titles in Applied Mathematics.

[12]  Katherine A. Yelick,et al.  Optimizing Sparse Matrix Computations for Register Reuse in SPARSITY , 2001, International Conference on Computational Science.

[13]  David E. Keyes,et al.  Four Horizons for Enhancing the Performance of Parallel Simulations Based on Partial Differential Equations , 2000, Euro-Par.

[14]  Katherine A. Yelick,et al.  Optimizing Sparse Matrix Vector Multiplication on SMP , 1999, SIAM Conference on Parallel Processing for Scientific Computing.

[15]  Hyun Jin Moon,et al.  Fast Sparse Matrix-Vector Multiplication by Exploiting Variable Block Structure , 2005, HPCC.

[16]  Samuel Williams,et al.  The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .

[17]  Youcef Saad,et al.  A Basic Tool Kit for Sparse Matrix Computations , 1990 .

[18]  Calvin J. Ribbens,et al.  Pattern-based sparse matrix representation for memory-efficient SMVM kernels , 2009, ICS.

[19]  Nectarios Koziris,et al.  Improving the Performance of Multithreaded Sparse Matrix-Vector Multiplication Using Index and Value Compression , 2008, 2008 37th International Conference on Parallel Processing.

[20]  Martin Burtscher,et al.  High Throughput Compression of Double-Precision Floating-Point Data , 2007, 2007 Data Compression Conference (DCC'07).

[21]  Nectarios Koziris,et al.  Understanding the Performance of Sparse Matrix-Vector Multiplication , 2008, 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008).

[22]  Ümit V. Çatalyürek,et al.  Decomposing Irregularly Sparse Matrices for Parallel Matrix-Vector Multiplication , 1996, IRREGULAR.

[23]  W. K. Anderson,et al.  Achieving High Sustained Performance in an Unstructured Mesh CFD Application , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[24]  Sivan Toledo,et al.  Improving the memory-system performance of sparse-matrix vector multiplication , 1997, IBM J. Res. Dev..

[25]  James Demmel,et al.  Performance models for evaluation and automatic tuning of symmetric sparse matrix-vector multiply , 2004, International Conference on Parallel Processing, 2004. ICPP 2004..

[26]  David Moloney,et al.  Streaming Sparse Matrix Compression/Decompression , 2005, HiPEAC.

[27]  Nectarios Koziris,et al.  Optimizing sparse matrix-vector multiplication using index and value compression , 2008, CF '08.

[28]  E. Im,et al.  Optimizing Sparse Matrix Vector Multiplication on SMP , 1999, PPSC.

[29]  P. Sadayappan,et al.  On improving the performance of sparse matrix-vector multiplication , 1997, Proceedings Fourth International Conference on High-Performance Computing.

[30]  Anoop Gupta,et al.  Parallel computer architecture - a hardware / software approach , 1998 .

[31]  Nectarios Koziris,et al.  Performance evaluation of the sparse matrix-vector multiplication on modern architectures , 2009, The Journal of Supercomputing.

[32]  Martin Hopkins,et al.  Synergistic Processing in Cell's Multicore Architecture , 2006, IEEE Micro.

[33]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[34]  Samuel Williams,et al.  Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[35]  Michael Lang,et al.  A Performance Evaluation of the Nehalem Quad-Core Processor for Scientific Computing , 2008, Parallel Process. Lett..

[36]  John M. Mellor-Crummey,et al.  Optimizing Sparse Matrix–Vector Product Computations Using Unroll and Jam , 2004, Int. J. High Perform. Comput. Appl..

[37]  D. Geer,et al.  Chip makers turn to multicore processors , 2005, Computer.

[38]  Andrew Lumsdaine,et al.  Accelerating sparse matrix computations via data compression , 2006, ICS '06.

[39]  Srihari Makineni,et al.  Exploring the cache design space for large scale CMPs , 2005, CARN.

[40]  James Demmel,et al.  Performance Optimizations and Bounds for Sparse Matrix-Vector Multiply , 2002, ACM/IEEE SC 2002 Conference (SC'02).