论文信息 - Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks

Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks

This paper introduces a storage format for sparse matrices, called compressed sparse blocks (CSB), which allows both Ax and A,x to be computed efficiently in parallel, where A is an n×n sparse matrix with nnzen nonzeros and x is a dense n-vector. Our algorithms use Θ(nnz) work (serial running time) and Θ(√nlgn) span (critical-path length), yielding a parallelism of Θ(nnz/√nlgn), which is amply high for virtually any large matrix. The storage requirement for CSB is the same as that for the more-standard compressed-sparse-rows (CSR) format, for which computing Ax in parallel is easy but A,x is difficult. Benchmark results indicate that on one processor, the CSB algorithms for Ax and A,x run just as fast as the CSR algorithm for Ax, but the CSB algorithms also scale up linearly with processors until limited by off-chip memory bandwidth.

[1] H. Markowitz. The Elimination form of the Inverse and its Application to Linear Programming , 1957 .

[2] Kenneth E. Iverson,et al. A programming language , 1899, AIEE-IRE '62 (Spring).

[3] William F. Tinney,et al. Techniques for Exploiting the Sparsity or the Network Admittance Matrix , 1963 .

[4] J. W. Walker,et al. Direct solutions of sparse network equations by optimally ordered triangular factorization , 1967 .

[5] E. Cuthill,et al. Reducing the bandwidth of sparse symmetric matrices , 1969, ACM '69.

[6] Alan George,et al. Computer Solution of Large Sparse Positive Definite , 1981 .

[7] Stanley C. Eisenstat,et al. Yale sparse matrix package I: The symmetric codes , 1982 .

[8] D.A. Calahan,et al. Computer solution of large positive definite systems , 1982, Proceedings of the IEEE.

[9] Paul L. Mills. The design of bit parallel systolic algorithms for matrix-vector and matrix-matrix multiplication , 1985, CSC '85.

[10] Bjarne Stroustrup,et al. C++ Programming Language , 1986, IEEE Softw..

[11] I. Duff,et al. Direct Methods for Sparse Matrices , 1987 .

[12] John R. Gilbert,et al. Sparse Matrices in MATLAB: Design and Implementation , 1992, SIAM J. Matrix Anal. Appl..

[13] Guy E. Blelloch,et al. Programming parallel algorithms , 1996, CACM.

[14] Bradley C. Kuszmaul,et al. Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.

[15] R. K. Shyamasundar,et al. Introduction to algorithms , 1996 .

[16] Sivan Toledo,et al. Improving the memory-system performance of sparse-matrix vector multiplication , 1997, IBM J. Res. Dev..

[17] Matteo Frigo,et al. The implementation of the Cilk-5 multithreaded language , 1998, PLDI.

[18] Gary L. Miller,et al. Geometric Mesh Partitioning: Implementation and Experiments , 1998, SIAM J. Sci. Comput..

[19] C. Leiserson,et al. Scheduling multithreaded computations by work stealing , 1999, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[20] Jack Dongarra,et al. Templates for the Solution of Algebraic Eigenvalue Problems , 2000, Software, environments, tools.

[21] Ümit V. Çatalyürek,et al. A fine-grain hypergraph model for 2D decomposition of sparse matrices , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.

[22] Yousef Saad,et al. Iterative methods for sparse linear systems , 2003 .

[23] Richard W. Vuduc,et al. Sparsity: Optimization Framework for Sparse Matrix Kernels , 2004, Int. J. High Perform. Comput. Appl..

[24] John R. Gilbert,et al. Sparse Matrices in Matlab*P: Design and Implementation , 2004, HiPC.

[25] Erik Elmroth,et al. SIAM REVIEW c ○ 2004 Society for Industrial and Applied Mathematics Vol. 46, No. 1, pp. 3–45 Recursive Blocked Algorithms and Hybrid Data Structures for Dense Matrix Library Software ∗ , 2022 .

[26] Katherine Yelick,et al. OSKI: A library of automatically tuned sparse matrix kernels , 2005 .

[27] Christos Faloutsos,et al. Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication , 2005, PKDD.

[28] Brendan Vastenhouw,et al. A Two-Dimensional Data Distribution Method for Parallel Sparse Matrix-Vector Multiplication , 2005, SIAM Rev..

[29] Sriram Raghavan,et al. Stanford WebBase components and applications , 2006, TOIT.

[30] David S. Wise,et al. Seven at one stroke: results from a cache-oblivious paradigm for scalable matrix algorithms , 2006, MSPC '06.

[31] Andrew Lumsdaine,et al. Accelerating sparse matrix computations via data compression , 2006, ICS '06.

[32] David S. Wise,et al. Analyzing block locality in Morton-order and Morton-hybrid matrices , 2006, MEDEA '06.

[33] Timothy A. Davis,et al. Direct methods for sparse linear systems , 2006, Fundamentals of algorithms.

[34] David S. Wise,et al. Analyzing block locality in Morton-order and Morton-hybrid matrices , 2007, CARN.

[35] James Demmel,et al. When cache blocking of sparse matrix vector multiply works and why , 2007, Applicable Algebra in Engineering, Communication and Computing.

[36] John R. Gilbert,et al. On the representation and multiplication of hypersparse matrices , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[37] Nectarios Koziris,et al. Optimizing sparse matrix-vector multiplication using index and value compression , 2008, CF '08.

[38] Rajeev Raman,et al. Converting to and from Dilated Integers , 2008, IEEE Transactions on Computers.

[39] Matteo Frigo,et al. Reducers and other Cilk++ hyperobjects , 2009, SPAA '09.

[40] Ronald L. Rivest,et al. Introduction to Algorithms, third edition , 2009 .

[41] Samuel Williams,et al. Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2009, Parallel Comput..

[42] Timothy A. Davis,et al. The university of Florida sparse matrix collection , 2011, TOMS.