SparseX: A Library for High-Performance Sparse Matrix-Vector Multiplication on Multicore Platforms

The Sparse Matrix-Vector Multiplication (SpMV) kernel ranks among the most important and thoroughly studied linear algebra operations, as it lies at the heart of many iterative methods for the solution of sparse linear systems, and often constitutes a severe performance bottleneck. Its optimization, which is intimately associated with the data structures used to store the sparse matrix, has always been of particular interest to the applied mathematics and computer science communities and has attracted further attention since the advent of multicore architectures. In this article, we present SparseX, an open source software package for SpMV targeting multicore platforms, that employs the state-of-the-art Compressed Sparse eXtended (CSX) sparse matrix storage format to deliver high efficiency through a highly usable “BLAS-like” interface that requires limited or no tuning. Performance results indicate that our library achieves superior performance over competitive libraries on large-scale problems.

[1]  Maria Ganzha,et al.  Utilizing Recursive Storage in Sparse Matrix-Vector Multiplication - Preliminary Considerations , 2010, CATA.

[2]  Samuel Williams,et al.  Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.

[3]  Calvin J. Ribbens,et al.  Pattern-based sparse matrix representation for memory-efficient SMVM kernels , 2009, ICS.

[4]  M. Hestenes,et al.  Methods of conjugate gradients for solving linear systems , 1952 .

[5]  Victor Eijkhout,et al.  An iterative solver benchmark , 2001, Sci. Program..

[6]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[7]  Gerhard Wellein,et al.  A Unified Sparse Matrix Data Format for Efficient General Sparse Matrix-Vector Multiplication on Modern Processors with Wide SIMD Units , 2013, SIAM J. Sci. Comput..

[8]  Katherine A. Yelick,et al.  Optimizing Sparse Matrix Computations for Register Reuse in SPARSITY , 2001, International Conference on Computational Science.

[9]  Vicente H. F. Batista,et al.  Parallel structurally-symmetric sparse matrix-vector products on multi-core processors , 2010, ArXiv.

[10]  Tamara G. Kolda,et al.  An overview of the Trilinos project , 2005, TOMS.

[11]  Nectarios Koziris,et al.  Performance evaluation of the sparse matrix-vector multiplication on modern architectures , 2009, The Journal of Supercomputing.

[12]  Ninghui Sun,et al.  SMAT: an input adaptive auto-tuner for sparse matrix-vector multiplication , 2013, PLDI.

[13]  Frederico Pratas,et al.  Cache-aware Roofline model: Upgrading the loft , 2014, IEEE Computer Architecture Letters.

[14]  David H. Bailey,et al.  The NAS parallel benchmarks summary and preliminary results , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[15]  Hyun Jin Moon,et al.  Fast Sparse Matrix-Vector Multiplication by Exploiting Variable Block Structure , 2005, HPCC.

[16]  Ankit Jain pOSKI : An Extensible Autotuning Framework to Perform Optimized SpMVs on Multicore Architectures , 2008 .

[17]  William Gropp,et al.  Efficient Management of Parallelism in Object-Oriented Numerical Software Libraries , 1997, SciTools.

[18]  Wolfram Schenck,et al.  Performance Evaluation of Scientific Applications on POWER8 , 2014, PMBS@SC.

[19]  Sandia Report,et al.  Improving Performance via Mini-applications , 2009 .

[20]  Ramesh C. Agarwal,et al.  A high performance algorithm using pre-processing for the sparse matrix-vector multiplication , 1992, Proceedings Supercomputing '92.

[21]  Mark J. Harris Mapping computational concepts to GPUs , 2005, SIGGRAPH Courses.

[22]  Endong Wang,et al.  Intel Math Kernel Library , 2014 .

[23]  Nectarios Koziris,et al.  CSX: an extended compression format for spmv on shared memory systems , 2011, PPoPP '11.

[24]  Nectarios Koziris,et al.  Optimizing sparse matrix-vector multiplication using index and value compression , 2008, CF '08.

[25]  Charles L. Lawson,et al.  Basic Linear Algebra Subprograms for Fortran Usage , 1979, TOMS.

[26]  John L. Henning SPEC CPU2006 benchmark descriptions , 2006, CARN.

[27]  P. Sadayappan,et al.  On improving the performance of sparse matrix-vector multiplication , 1997, Proceedings Fourth International Conference on High-Performance Computing.

[28]  Nectarios Koziris,et al.  An Extended Compression Format for the Optimization of Sparse Matrix-Vector Multiplication , 2013, IEEE Transactions on Parallel and Distributed Systems.

[29]  M. Gutknecht BLOCK KRYLOV SPACE METHODS FOR LINEAR SYSTEMS WITH MULTIPLE RIGHT-HAND SIDES : AN , 2005 .

[30]  John K. Reid,et al.  Some Design Features of a Sparse Matrix Code , 1979, TOMS.

[31]  R. F. Boisvert,et al.  The Matrix Market Exchange Formats: Initial Design | NIST , 1996 .

[32]  Brian Vinter,et al.  CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication , 2015, ICS.

[33]  J. W. Walker,et al.  Direct solutions of sparse network equations by optimally ordered triangular factorization , 1967 .

[34]  Arturo González-Escribano,et al.  Blending Extensibility and Performance in Dense and Sparse Parallel Data Management , 2014, IEEE Transactions on Parallel and Distributed Systems.

[35]  E. Cuthill,et al.  Reducing the bandwidth of sparse symmetric matrices , 1969, ACM '69.

[36]  Andrew Lumsdaine,et al.  Accelerating sparse matrix computations via data compression , 2006, ICS '06.

[37]  Katherine Yelick,et al.  OSKI: A library of automatically tuned sparse matrix kernels , 2005 .

[38]  D. Sorensen Numerical methods for large eigenvalue problems , 2002, Acta Numerica.

[39]  Sandia Report,et al.  Toward a New Metric for Ranking High Performance Computing Systems , 2013 .

[40]  Samuel Williams,et al.  Roofline: An Insightful Visual Performance Model for Floating-Point Programs and Multicore Architectures , 2008 .

[41]  Nectarios Koziris,et al.  Improving the Performance of the Symmetric Sparse Matrix-Vector Multiplication in Multicore , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[42]  C.A. Beattie,et al.  Inexact Solves in Krylov-based Model Reduction , 2006, Proceedings of the 45th IEEE Conference on Decision and Control.

[43]  Gerhard Wellein,et al.  LIKWID: Lightweight Performance Tools , 2011, CHPC.

[44]  Adrian E. Raftery,et al.  Weather Forecasting with Ensemble Methods , 2005, Science.

[45]  Joseph L. Greathouse,et al.  Efficient Sparse Matrix-Vector Multiplication on GPUs Using the CSR Storage Format , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[46]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[47]  Udo W. Pooch,et al.  A Survey of Indexing Techniques for Sparse Matrices , 1973, CSUR.