论文信息 - Parallel and Scalable Sparse Basic Linear Algebra Subprograms

Parallel and Scalable Sparse Basic Linear Algebra Subprograms

Sparse basic linear algebra subprograms (BLAS) are fundamental building blocks for numerous scientific computations and graph applications. Compared with Dense BLAS, parallelization of Sparse BLAS routines entails extra challenges due to the irregularity of sparse data structures. This thesis proposes new fundamental algorithms and data structures that accelerate Sparse BLAS routines on modern massively parallel processors: (1) a new heap data structure named ad-heap, for faster heap operations on heterogeneous processors, (2) a new sparse matrix representation named CSR5, for faster sparse matrix-vector multiplication (SpMV) on homogeneous processors such as CPUs, GPUs and Xeon Phi, (3) a new CSR-based SpMV algorithm for a variety of tightly coupled CPU-GPU heterogeneous processors, and (4) a new framework and associated algorithms for sparse matrix-matrix multiplication (SpGEMM) on GPUs and heterogeneous processors. The thesis compares the proposed methods with state-of-the-art approaches on six homogeneous and five heterogeneous processors from Intel, AMD and nVidia. Using in total 38 sparse matrices as a benchmark suite, the experimental results show that the proposed methods obtain significant performance improvement over the best existing algorithms.

Weifeng Liu | Weifeng Liu

[1] Brian Vinter,et al. CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication , 2015, ICS.

[2] Karl Meerbergen,et al. Sparse Matrix-Vector Multiplication: Parallelization and Vectorization , 2015 .

[3] Andrew Lumsdaine,et al. Accelerating sparse matrix computations via data compression , 2006, ICS '06.

[4] Yi Yang,et al. CPU-assisted GPGPU on fused CPU-GPU architectures , 2012, IEEE International Symposium on High-Performance Comp Architecture.

[5] Rob H. Bisseling,et al. Two-dimensional cache-oblivious sparse matrix-vector multiplication , 2011, Parallel Comput..

[6] Shengen Yan,et al. yaSpMV: yet another SpMV framework on GPUs , 2014, PPoPP.

[7] Samuel Williams,et al. The potential of the cell processor for scientific computing , 2005, CF '06.

[8] Samuel Williams,et al. Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2009, Parallel Comput..

[9] Dirk Roose,et al. High-level strategies for parallel shared-memory sparse matrix – vector multiplication , 2012 .

[10] Brian Vinter,et al. A framework for general sparse matrix-matrix multiplication on GPUs and heterogeneous processors , 2015, J. Parallel Distributed Comput..

[11] Sergio Escalera,et al. Unsupervised Behavior-Specific Dictionary Learning for Abnormal Event Detection , 2015, BMVC.