Distributed-memory parallel algorithms for sparse times tall-skinny-dense matrix multiplication
暂无分享,去创建一个
Katherine A. Yelick | Alok Tripathy | Israt Nisa | Aydin Buluç | Oguz Selvitopi | Benjamin Brock | K. Yelick | A. Buluç | Alok Tripathy | Oguz Selvitopi | Benjamin Brock | Israt Nisa
[1] Lynn Elliot Cannon,et al. A cellular computer to implement the kalman filter algorithm , 1969 .
[2] Alok Aggarwal,et al. Communication Complexity of PRAMs , 1990, Theor. Comput. Sci..
[3] Ramesh C. Agarwal,et al. A three-dimensional approach to parallel matrix multiplication , 1995, IBM J. Res. Dev..
[4] Robert A. van de Geijn,et al. SUMMA: scalable universal matrix multiplication algorithm , 1995, Concurr. Pract. Exp..
[5] Andrew V. Knyazev,et al. Toward the Optimal Preconditioned Eigensolver: Locally Optimal Block Preconditioned Conjugate Gradient Method , 2001, SIAM J. Sci. Comput..
[6] Robert A. van de Geijn,et al. Collective communication: theory, practice, and experience: Research Articles , 2007 .
[7] John R. Gilbert,et al. On the representation and multiplication of hypersparse matrices , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[8] James Demmel,et al. Communication-Optimal Parallel 2.5D Matrix Multiplication and LU Factorization Algorithms , 2011, Euro-Par.
[9] John R. Gilbert,et al. The Combinatorial BLAS: design, implementation, and applications , 2011, Int. J. High Perform. Comput. Appl..
[10] John R. Gilbert,et al. Parallel Sparse Matrix-Matrix Multiplication and Indexing: Implementation and Experiments , 2011, SIAM J. Sci. Comput..
[11] James Demmel,et al. Communication-Optimal Parallel Recursive Rectangular Matrix Multiplication , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[12] Samuel Williams,et al. Optimizing Sparse Matrix-Multiple Vectors Multiplication for Nuclear Configuration Interaction Calculations , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.
[13] Samuel Williams,et al. Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication , 2015, SIAM J. Sci. Comput..
[14] Cevdet Aykanat,et al. Improving performance of sparse matrix dense matrix multiplication on large-scale parallel systems , 2016, Parallel Comput..
[15] Martin D. Schatz,et al. Parallel Matrix Multiplication: A Systematic Journey , 2016, SIAM J. Sci. Comput..
[16] Leonid Oliker,et al. Communication-Avoiding Parallel Sparse-Dense Matrix-Matrix Multiplication , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[17] Alexander Heinecke,et al. LIBXSMM: Accelerating Small Matrix Multiplications by Runtime Code Generation , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.
[18] John D. Owens,et al. Design Principles for Sparse Matrix Multiplication on the GPU , 2018, Euro-Par.
[19] Haesun Park,et al. MPI-FAUN: An MPI-Based Framework for Alternating-Updating Nonnegative Matrix Factorization , 2016, IEEE Transactions on Knowledge and Data Engineering.
[20] Georgios A. Pavlopoulos,et al. HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks , 2018, Nucleic acids research.
[21] Weifeng Liu,et al. Register-Aware Optimizations for Parallel Sparse Matrix–Matrix Multiplication , 2019, International Journal of Parallel Programming.
[22] P. Sadayappan,et al. Adaptive sparse tiling for sparse matrix multiplication , 2019, PPoPP.
[23] Katherine Yelick,et al. BCL: A Cross-Platform Distributed Data Structures Library , 2018, ICPP.
[24] Torsten Hoefler,et al. Red-blue pebbling revisited: near optimal parallel matrix-matrix multiplication , 2019, SC.
[25] Minjie Wang,et al. FeatGraph: A Flexible and Efficient Backend for Graph Neural Network Systems , 2020, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.
[26] K. Yelick,et al. Reducing Communication in Graph Neural Network Training , 2020, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.
[27] Erich Elsen,et al. Sparse GPU Kernels for Deep Learning , 2020, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.
[28] Kyungyong Lee,et al. Performance Prediction of Sparse Matrix Multiplication on a Distributed BigData Processing Environment , 2020, 2020 IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion (ACSOS-C).
[29] Erich Elsen,et al. Fast Sparse ConvNets , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[30] Yu Wang,et al. GE-SpMM: General-Purpose Sparse Matrix-Matrix Multiplication on GPUs for Graph Neural Networks , 2020, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.
[31] Ziheng Wang,et al. SparseRT: Accelerating Unstructured Sparsity on GPUs for Deep Learning Inference , 2020, PACT.
[32] Süreyya Emre Kurt,et al. Efficient Tiled Sparse Matrix Multiplication through Matrix Signatures , 2020, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.