Memory bandwidth optimization of SpMV on GPGPUs
暂无分享,去创建一个
Chenggang Clarence Yan | Hui Yu | Jian Yin | Weizhi Xu | Yuxuan Wang | Bochuan Chen | Yingping Zhang | Zhu Tian
[1] Yao Zhang,et al. Scan primitives for GPU computing , 2007, GH '07.
[2] John R. Gilbert,et al. Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks , 2009, SPAA '09.
[3] Arutyun Avetisyan,et al. Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures , 2010, HiPEAC.
[4] Liqiang Wang,et al. Auto-Tuning CUDA Parameters for Sparse Matrix-Vector Multiplication on GPUs , 2010, 2010 International Conference on Computational and Information Sciences.
[5] Liang Li,et al. Efficient parallel HEVC intra-prediction on many-core processor , 2014 .
[6] Srinivasan Parthasarathy,et al. Fast Sparse Matrix-Vector Multiplication on GPUs: Implications for Graph Mining , 2011, Proc. VLDB Endow..
[7] Yongdong Zhang,et al. Efficient Parallel Framework for HEVC Motion Estimation on Many-Core Processors , 2014, IEEE Transactions on Circuits and Systems for Video Technology.
[8] Richard Vuduc,et al. Automatic performance tuning of sparse matrix kernels , 2003 .
[9] Yongdong Zhang,et al. Highly Parallel Framework for HEVC Motion Estimation on Many-Core Platform , 2013, 2013 Data Compression Conference.
[10] Rajesh Bordawekar,et al. Optimizing Sparse Matrix-Vector Multiplication on GPUs using Compile-time and Run-time Strategies , 2008 .
[11] Dongrui Fan,et al. Auto-Tuning GEMV on Many-Core GPU , 2012, 2012 IEEE 18th International Conference on Parallel and Distributed Systems.
[12] Satoshi Matsuoka,et al. Fast Conjugate Gradients with Multiple GPUs , 2009, ICCS.
[13] Michael Garland,et al. Efficient Sparse Matrix-Vector Multiplication on CUDA , 2008 .
[14] Eun Im,et al. Optimizing the Performance of Sparse Matrix-Vector Multiplication , 2000 .
[15] Michael Garland,et al. Implementing sparse matrix-vector multiplication on throughput-oriented processors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[16] Hai Bing Yin,et al. Fast Soft Decision Quantization With Adaptive Preselection and Dynamic Trellis Graph , 2015, IEEE Transactions on Circuits and Systems for Video Technology.
[17] Nectarios Koziris,et al. CSX: an extended compression format for spmv on shared memory systems , 2011, PPoPP '11.
[18] Andrew Lumsdaine,et al. Accelerating sparse matrix computations via data compression , 2006, ICS '06.
[19] Yongdong Zhang,et al. A Highly Parallel Framework for HEVC Coding Unit Partitioning Tree Decision on Many-core Processors , 2014, IEEE Signal Processing Letters.
[20] Dongrui Fan,et al. Accelerating Sparse Matrix Vector Multiplication on Many-Core GPUs , 2012 .
[21] Samuel Williams,et al. Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).
[22] Wilfred Pinfold,et al. Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis , 2009, HiPC 2009.
[23] Yongdong Zhang,et al. Parallel deblocking filter for H.264/AVC implemented on Tile64 platform , 2011, 2011 IEEE International Conference on Multimedia and Expo.
[24] Eitan Grinspun,et al. Sparse matrix solvers on the GPU: conjugate gradients and multigrid , 2003, SIGGRAPH Courses.
[25] Richard W. Vuduc,et al. Model-driven autotuning of sparse matrix-vector multiply on GPUs , 2010, PPoPP '10.
[26] Samuel Williams,et al. Auto-tuning performance on multicore computers , 2008 .
[27] Da Wang,et al. Optimizing Sparse Matrix Vector Multiplication Using Cache Blocking Method on Fermi GPU , 2012, 2012 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing.
[28] Ester M. Garzón,et al. The sparse matrix vector product on GPUs , 2011 .
[29] Samuel Williams,et al. Reduced-Bandwidth Multithreaded Algorithms for Sparse Matrix-Vector Multiplication , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[30] Yongdong Zhang,et al. Efficient Parallel Framework for H.264/AVC Deblocking Filter on Many-Core Platform , 2012, IEEE Transactions on Multimedia.