A novel data transformation and execution strategy for accelerating sparse matrix multiplication on GPUs
暂无分享,去创建一个
[1] Xuemin Lin,et al. Speedup Graph Processing by Graph Ordering , 2016, SIGMOD Conference.
[2] Jure Leskovec,et al. Mining of Massive Datasets, 2nd Ed , 2014 .
[3] Viktor K. Prasanna,et al. ReCALL: Reordered Cache Aware Locality Based Graph Processing , 2017, 2017 IEEE 24th International Conference on High Performance Computing (HiPC).
[4] Rob H. Bisseling,et al. Cache-Oblivious Sparse Matrix--Vector Multiplication by Using Sparse Matrix Partitioning Methods , 2009, SIAM J. Sci. Comput..
[5] Max Welling,et al. Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.
[6] Jack J. Dongarra,et al. Accelerating the LOBPCG method on GPUs using a blocked sparse matrix vector product , 2015, SpringSim.
[7] Timothy A. Davis,et al. The university of Florida sparse matrix collection , 2011, TOMS.
[8] P. Sadayappan,et al. Adaptive sparse tiling for sparse matrix multiplication , 2019, PPoPP.
[9] Ümit V. Çatalyürek,et al. Regularizing graph centrality computations , 2015, J. Parallel Distributed Comput..
[10] Michael Garland,et al. Merge-based sparse matrix-vector multiplication (SpMV) using the CSR storage format , 2016, PPoPP.
[11] P. Sadayappan,et al. Sampled Dense Matrix Multiplication for High-Performance Machine Learning , 2018, 2018 IEEE 25th International Conference on High Performance Computing (HiPC).
[12] Richard W. Vuduc,et al. Model-driven autotuning of sparse matrix-vector multiply on GPUs , 2010, PPoPP '10.
[13] Katherine Yelick,et al. OSKI: A library of automatically tuned sparse matrix kernels , 2005 .
[14] Peng Jiang,et al. Reusing Data Reorganization for Efficient SIMD Parallelization of Adaptive Irregular Applications , 2016, ICS.
[15] Peng Jiang,et al. Conflict-free vectorization of associative irregular applications with recent SIMD architectural advances , 2018, CGO.
[16] John Canny,et al. BIDMach: Large-scale Learning with Zero Memory Allocation , 2013 .
[17] Hans-Peter Seidel,et al. Globally homogeneous, locally adaptive sparse matrix-vector multiplication on the GPU , 2017, ICS.
[18] Michalis K. Titsias,et al. The Infinite Gamma-Poisson Feature Model , 2007, NIPS.
[19] Srinivasan Parthasarathy,et al. Fast Sparse Matrix-Vector Multiplication on GPUs for Graph Applications , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[20] John F. Canny,et al. Collaborative filtering with privacy , 2002, Proceedings 2002 IEEE Symposium on Security and Privacy.
[21] Xipeng Shen,et al. On-the-fly elimination of dynamic irregularities for GPU computing , 2011, ASPLOS XVI.
[22] Vipin Kumar,et al. A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..
[23] Francisco Vázquez,et al. FastSpMM: An Efficient Library for Sparse Matrix Matrix Product on GPUs , 2014, Comput. J..
[24] Yehuda Koren,et al. Matrix Factorization Techniques for Recommender Systems , 2009, Computer.
[25] Peng Jiang,et al. Exploiting recent SIMD architectural advances for irregular applications , 2016, 2016 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[26] Feng Shi,et al. Sparse Matrix Format Selection with Multiclass SVM for SpMV on GPU , 2016, 2016 45th International Conference on Parallel Processing (ICPP).
[27] Leonid Oliker,et al. Effects of Ordering Strategies and Programming Paradigms on Sparse Matrix Computations , 2013, SIAM Rev..
[28] Joseph L. Greathouse,et al. Structural Agnostic SpMV: Adapting CSR-Adaptive for Irregular Matrices , 2015, 2015 IEEE 22nd International Conference on High Performance Computing (HiPC).
[29] Srinivasan Parthasarathy,et al. Efficient sparse-matrix multi-vector product on GPUs , 2018, HPDC.
[30] John D. Owens,et al. Design Principles for Sparse Matrix Multiplication on the GPU , 2018, Euro-Par.
[31] Samuel Williams,et al. Optimizing Sparse Matrix-Multiple Vectors Multiplication for Nuclear Configuration Interaction Calculations , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.
[32] Nectarios Koziris,et al. CSX: an extended compression format for spmv on shared memory systems , 2011, PPoPP '11.
[33] Ken Kennedy,et al. Improving Memory Hierarchy Performance for Irregular Applications Using Data and Computation Reorderings , 2001, International Journal of Parallel Programming.
[34] Shoaib Kamil,et al. The tensor algebra compiler , 2017, Proc. ACM Program. Lang..
[35] Michelle Mills Strout,et al. A Fast Parallel Graph Partitioner for Shared-Memory Inspector/Executor Strategies , 2012, LCPC.
[36] Philip S. Yu,et al. A Comprehensive Survey on Graph Neural Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.
[37] Ronald L. Rivest,et al. Introduction to Algorithms, third edition , 2009 .
[38] Xin-She Yang,et al. Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.
[39] Ryan A. Rossi,et al. The Network Data Repository with Interactive Graph Analytics and Visualization , 2015, AAAI.
[40] Michael Garland,et al. Implementing sparse matrix-vector multiplication on throughput-oriented processors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.