SpMV Profiling and Optimization Analysis
暂无分享,去创建一个
[1] Ester M. Garzón,et al. Improving the Performance of the Sparse Matrix Vector Product with GPUs , 2010, 2010 10th IEEE International Conference on Computer and Information Technology.
[2] Richard W. Vuduc,et al. Model-driven autotuning of sparse matrix-vector multiply on GPUs , 2010, PPoPP '10.
[3] Rajesh Bordawekar,et al. Optimizing Sparse Matrix-Vector Multiplication on GPUs , 2009 .
[4] Samuel Williams,et al. Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2009, Parallel Comput..
[5] Ping Guo,et al. Accurate CUDA performance modeling for sparse matrix-vector multiplication , 2012, 2012 International Conference on High Performance Computing & Simulation (HPCS).
[6] Victor Eijkhout,et al. Self-Adapting Linear Algebra Algorithms and Software , 2005, Proceedings of the IEEE.
[7] Yao Zhang,et al. A quantitative performance analysis model for GPU architectures , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.
[8] Richard W. Vuduc,et al. Sparsity: Optimization Framework for Sparse Matrix Kernels , 2004, Int. J. High Perform. Comput. Appl..
[9] Hyesoon Kim,et al. An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness , 2009, ISCA '09.
[10] Dominik Grewe,et al. Automatically generating and tuning GPU code for sparse matrix-vector multiplication from a high-level representation , 2011, GPGPU-4.
[11] Jack J. Dongarra,et al. Optimizing matrix multiplication for a short-vector SIMD architecture - CELL processor , 2009, Parallel Comput..
[12] Wei Xue,et al. Performance modeling and optimization of sparse matrix-vector multiplication on NVIDIA CUDA platform , 2011, The Journal of Supercomputing.
[13] Michael Garland,et al. Implementing sparse matrix-vector multiplication on throughput-oriented processors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[14] Wen-mei W. Hwu,et al. Program optimization space pruning for a multithreaded gpu , 2008, CGO '08.
[15] Srinivasan Parthasarathy,et al. Fast Sparse Matrix-Vector Multiplication on GPUs: Implications for Graph Mining , 2011, Proc. VLDB Endow..
[16] Liqiang Wang,et al. Auto-Tuning CUDA Parameters for Sparse Matrix-Vector Multiplication on GPUs , 2010, 2010 International Conference on Computational and Information Sciences.
[17] Ping Guo,et al. A Performance Modeling and Optimization Analysis Tool for Sparse Matrix-Vector Multiplication on GPUs , 2014, IEEE Transactions on Parallel and Distributed Systems.
[18] Yuping Zhang,et al. Optimizing sparse matrix-vector multiplication on CUDA , 2010, 2010 2nd International Conference on Education Technology and Computer.
[19] William Gropp,et al. An adaptive performance modeling tool for GPU architectures , 2010, PPoPP '10.
[20] K. Srinathan,et al. A performance prediction model for the CUDA GPGPU platform , 2009, 2009 International Conference on High Performance Computing (HiPC).
[21] David R. Kaeli,et al. Exploring the multiple-GPU design space , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[22] He Huang,et al. A model-driven partitioning and auto-tuning integrated framework for sparse matrix-vector multiplication on GPUs , 2011 .