Design Principles for Sparse Matrix Multiplication on the GPU
暂无分享,去创建一个
[1] Andrew V. Knyazev,et al. Toward the Optimal Preconditioned Eigensolver: Locally Optimal Block Preconditioned Conjugate Gradient Method , 2001, SIAM J. Sci. Comput..
[2] John R. Gilbert,et al. Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks , 2009, SPAA '09.
[3] E M Garzón,et al. A matrix approach to tomographic reconstruction and its implementation on GPUs. , 2010, Journal of structural biology.
[4] James Demmel,et al. Benchmarking GPUs to tune dense linear algebra , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[5] Ümit V. Çatalyürek,et al. Regularizing graph centrality computations , 2015, J. Parallel Distributed Comput..
[6] Alexander Tiskin,et al. All-Pairs Shortest Paths Computation in the BSP Model , 2001, ICALP.
[7] Samuel Williams,et al. Optimizing Sparse Matrix-Multiple Vectors Multiplication for Nuclear Configuration Interaction Calculations , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.
[8] Srinivasan Parthasarathy,et al. Efficient sparse-matrix multi-vector product on GPUs , 2018, HPDC.
[9] Efstratios Gallopoulos,et al. An Iterative Method for Nonsymmetric Systems with Multiple Right-Hand Sides , 1995, SIAM J. Sci. Comput..
[10] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.
[11] Robert N. M. Watson,et al. Into the depths of C: elaborating the de facto standards , 2016, PLDI.
[12] Pradeep Ravikumar,et al. Large Scale Distributed Sparse Precision Estimation , 2013, NIPS.
[13] Michael Garland,et al. Merge-Based Parallel Sparse Matrix-Vector Multiplication , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.
[14] Francisco Vázquez,et al. FastSpMM: An Efficient Library for Sparse Matrix Matrix Product on GPUs , 2014, Comput. J..
[15] Luke N. Olson,et al. Optimizing Sparse Matrix—Matrix Multiplication for the GPU , 2015, ACM Trans. Math. Softw..
[16] Scott McMillan,et al. Design of the GraphBLAS API for C , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).
[17] Jack Dongarra,et al. Templates for the Solution of Algebraic Eigenvalue Problems , 2000, Software, environments, tools.
[18] Davide Barbieri,et al. Sparse Matrix-Vector Multiplication on GPGPUs , 2017, ACM Trans. Math. Softw..
[19] Michael Garland,et al. Implementing sparse matrix-vector multiplication on throughput-oriented processors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[20] Riko Jacob,et al. The I/O Complexity of Sparse Matrix Dense Matrix Multiplication , 2010, LATIN.
[21] Haesun Park,et al. A high-performance parallel algorithm for nonnegative matrix factorization , 2015, PPoPP.
[22] Jack J. Dongarra,et al. Accelerating the LOBPCG method on GPUs using a blocked sparse matrix vector product , 2015, SpringSim.
[23] Timothy A. Davis,et al. The university of Florida sparse matrix collection , 2011, TOMS.
[24] Inderjit S. Dhillon,et al. Multi-Scale Spectral Decomposition of Massive Graphs , 2014, NIPS.
[25] Maurice Herlihy,et al. Warp-aware trace scheduling for GPUs , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).