Dynamic Sparse-Matrix Allocation on GPUs

Sparse matrices are a core component in many numerical simulations, and their efficiency is essential to achieving high performance. Dynamic sparse-matrix allocation (insertion) can benefit a number of problems such as sparse-matrix factorization, sparse-matrix-matrix addition, static analysis (e.g., points-to analysis), computing transitive closure, and other graph algorithms. Existing sparse-matrix formats are poorly designed to handle dynamic updates. The compressed sparse-row (CSR) format is fully compact and must be rebuilt after each new entry. Ellpack (ELL) stores a constant number of entries per row, which allows for efficient insertion and sparse matrix-vector multiplication (SpMV) but is memory inefficient and strictly limits row size. The coordinate (COO) format stores a list of entries and is efficient for both memory use and insertion time; however, it is much less efficient at SpMV. Hybrid ellpack (HYB) compromises by using a combination of ELL and COO but degrades in performance as the COO portion fills up. Rows that use the COO portion require it to be completely traversed during every SpMV operation.

[1]  Keshav Pingali,et al.  A GPU implementation of inclusion-based points-to analysis , 2012, PPoPP '12.

[2]  Eun-Jin Im,et al.  Optimization of Sparse Matrix Kernels for Data Mining , 2007 .

[3]  Arutyun Avetisyan,et al.  Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures , 2010, HiPEAC.

[4]  Matthew Might,et al.  EigenCFA: accelerating flow analysis with GPUs , 2011, POPL '11.

[5]  Richard Vuduc,et al.  Automatic performance tuning of sparse matrix kernels , 2003 .

[6]  Srinivasan Parthasarathy,et al.  Fast Sparse Matrix-Vector Multiplication on GPUs: Implications for Graph Mining , 2011, Proc. VLDB Endow..

[7]  Tinkara Toš,et al.  Graph Algorithms in the Language of Linear Algebra , 2012, Software, environments, tools.

[8]  Joseph L. Greathouse,et al.  Efficient Sparse Matrix-Vector Multiplication on GPUs Using the CSR Storage Format , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[9]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[10]  Olin Shivers,et al.  Control-flow analysis of higher-order languages of taming lambda , 1991 .

[11]  Shengen Yan,et al.  yaSpMV: yet another SpMV framework on GPUs , 2014, PPoPP.

[12]  M. Might,et al.  Partitioning 0-CFA for the GPU , 2014 .

[13]  David A. Bader,et al.  Revisiting Edge and Node Parallelism for Dynamic GPU Graph Analytics , 2014, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.

[14]  Michael Garland,et al.  Implementing sparse matrix-vector multiplication on throughput-oriented processors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[15]  I. Reguly,et al.  Efficient sparse matrix-vector multiplication on cache-based GPUs , 2012, 2012 Innovative Parallel Computing (InPar).

[16]  Seid Koric,et al.  Sparse matrix factorization on massively parallel computers , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[17]  Michael Garland,et al.  Understanding throughput-oriented architectures , 2010, Commun. ACM.

[18]  Aaftab Munshi,et al.  The OpenCL specification , 2009, 2009 IEEE Hot Chips 21 Symposium (HCS).

[19]  Michael Garland,et al.  Sparse matrix computations on manycore GPU’s , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[20]  Kurt Keutzer,et al.  clSpMV: A Cross-Platform OpenCL SpMV Framework on GPUs , 2012, ICS '12.

[21]  Samuel Williams,et al.  Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2009, Parallel Comput..

[22]  Richard W. Vuduc,et al.  Model-driven autotuning of sparse matrix-vector multiply on GPUs , 2010, PPoPP '10.

[23]  John R. Gilbert,et al.  High-Performance Graph Algorithms from Parallel Sparse Matrices , 2006, PARA.

[24]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[25]  Xing Liu,et al.  Efficient sparse matrix-vector multiplication on x86-based many-core processors , 2013, ICS '13.

[26]  Michael A. Bender,et al.  Insertion Sort is O(n log n) , 2005, Theory of Computing Systems.

[27]  Michael Garland,et al.  Efficient Sparse Matrix-Vector Multiplication on CUDA , 2008 .

[28]  Jan Midtgaard Control-Flow Analysis of Functional Programs , 2007 .

[29]  Srinivasan Parthasarathy,et al.  Fast Sparse Matrix-Vector Multiplication on GPUs for Graph Applications , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[30]  Gerhard Wellein,et al.  A unified sparse matrix data format for modern processors with wide SIMD units , 2013, ArXiv.

[31]  P. Sadayappan,et al.  An efficient two-dimensional blocking strategy for sparse matrix-vector multiplication on GPUs , 2014, ICS '14.

[32]  Haim Avron,et al.  Managing data-movement for effective shared-memory parallelization of out-of-core sparse solvers , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.