Reduced-Bandwidth Multithreaded Algorithms for Sparse Matrix-Vector Multiplication
暂无分享,去创建一个
Samuel Williams | Leonid Oliker | James Demmel | Aydin Buluç | J. Demmel | Samuel Williams | A. Buluç | L. Oliker
[1] Marcin Paprzycki,et al. On BLAS Operations with Recursively Stored Sparse Matrices , 2010, 2010 12th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing.
[2] Samuel Williams,et al. Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).
[3] John R. Gilbert,et al. Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks , 2009, SPAA '09.
[4] Larry Carter,et al. Sparse Tiling for Stationary Iterative Methods , 2004, Int. J. High Perform. Comput. Appl..
[5] Roman Geus,et al. Towards a fast parallel sparse symmetric matrix-vector multiplication , 2001, Parallel Comput..
[6] Richard Vuduc,et al. Automatic performance tuning of sparse matrix kernels , 2003 .
[7] Ronald L. Rivest,et al. Introduction to Algorithms, third edition , 2009 .
[8] Nectarios Koziris,et al. Optimizing sparse matrix-vector multiplication using index and value compression , 2008, CF '08.
[9] Sivan Toledo,et al. Improving the memory-system performance of sparse-matrix vector multiplication , 1997, IBM J. Res. Dev..
[10] James Reinders,et al. Intel® threading building blocks , 2008 .
[11] Calvin J. Ribbens,et al. Pattern-based sparse matrix representation for memory-efficient SMVM kernels , 2009, ICS.
[12] Roman Geus,et al. Towards a fast parallel sparse matrix-vector multiplication , 2000, PARCO.
[13] Robert D. Blumofe,et al. Scheduling multithreaded computations by work stealing , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.
[14] Olivier Temam,et al. Characterizing the behavior of sparse algorithms on caches , 1992, Proceedings Supercomputing '92.
[15] Marcin Dabrowski,et al. Parallel symmetric sparse matrix-vector product on scalar multi-core CPUs , 2010, Parallel Comput..
[16] E. Cuthill,et al. Reducing the bandwidth of sparse symmetric matrices , 1969, ACM '69.
[17] Andrew Lumsdaine,et al. Accelerating sparse matrix computations via data compression , 2006, ICS '06.
[18] John R. Gilbert,et al. On the representation and multiplication of hypersparse matrices , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[19] Richard W. Vuduc,et al. Sparsity: Optimization Framework for Sparse Matrix Kernels , 2004, Int. J. High Perform. Comput. Appl..
[20] Rob H. Bisseling,et al. Cache-Oblivious Sparse Matrix--Vector Multiplication by Using Sparse Matrix Partitioning Methods , 2009, SIAM J. Sci. Comput..
[21] Brendan Vastenhouw,et al. A Two-Dimensional Data Distribution Method for Parallel Sparse Matrix-Vector Multiplication , 2005, SIAM Rev..
[22] P. Sadayappan,et al. On improving the performance of sparse matrix-vector multiplication , 1997, Proceedings Fourth International Conference on High-Performance Computing.
[23] Ümit V. Çatalyürek,et al. Hypergraph-Partitioning-Based Decomposition for Parallel Sparse-Matrix Vector Multiplication , 1999, IEEE Trans. Parallel Distributed Syst..
[24] James Demmel,et al. Performance models for evaluation and automatic tuning of symmetric sparse matrix-vector multiply , 2004, International Conference on Parallel Processing, 2004. ICPP 2004..
[25] Bradley C. Kuszmaul,et al. Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.
[26] Timothy A. Davis,et al. The university of Florida sparse matrix collection , 2011, TOMS.
[27] Samuel Williams,et al. Auto-tuning performance on multicore computers , 2008 .
[28] Alejandro Duran,et al. The Design of OpenMP Tasks , 2009, IEEE Transactions on Parallel and Distributed Systems.
[29] Samuel Williams,et al. Memory-efficient optimization of Gyrokinetic particle-to-grid interpolation for multicore processors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[30] Michael A. Bender,et al. Optimal Sparse Matrix Dense Vector Multiplication in the I/O-Model , 2007, SPAA '07.
[31] Hyun Jin Moon,et al. Fast Sparse Matrix-Vector Multiplication by Exploiting Variable Block Structure , 2005, HPCC.
[32] Katherine Yelick,et al. OSKI: A library of automatically tuned sparse matrix kernels , 2005 .
[33] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.