CSR2: A New Format for SIMD-accelerated SpMV
暂无分享,去创建一个
[1] Razvan Nane,et al. Sparstition: A Partitioning Scheme for Large-Scale Sparse Matrix Vector Multiplication on FPGA , 2019, 2019 IEEE 30th International Conference on Application-specific Systems, Architectures and Processors (ASAP).
[2] Qiao Sun,et al. Bandwidth Reduced Parallel SpMV on the SW26010 Many-Core Platform , 2018, ICPP.
[3] Graham Markall. Accelerating Unstructured Mesh Computational Fluid Dynamics on the NVidia Tesla GPU Architecture , 2011 .
[4] Kenli Li,et al. Performance Analysis and Optimization for SpMV on GPU Using Probabilistic Modeling , 2015, IEEE Transactions on Parallel and Distributed Systems.
[5] Gerhard Wellein,et al. A Unified Sparse Matrix Data Format for Efficient General Sparse Matrix-Vector Multiplication on Modern Processors with Wide SIMD Units , 2013, SIAM J. Sci. Comput..
[6] Jie Liu,et al. VBSF: a new storage format for SIMD sparse matrix–vector multiplication on modern processors , 2019, The Journal of Supercomputing.
[7] Frédéric Magoulès,et al. Iterative Methods for Sparse Linear Systems on Graphics Processing Unit , 2012, 2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems.
[8] Brian Vinter,et al. CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication , 2015, ICS.
[9] Lu Yao,et al. Implementing Sparse Matrix-Vector multiplication using CUDA based on a hybrid sparse matrix format , 2010, 2010 International Conference on Computer Application and System Modeling (ICCASM 2010).
[10] Kurt Keutzer,et al. clSpMV: A Cross-Platform OpenCL SpMV Framework on GPUs , 2012, ICS '12.
[11] Yu Wang,et al. FPGA and GPU implementation of large scale SpMV , 2010, 2010 IEEE 8th Symposium on Application Specific Processors (SASP).
[12] Srinivasan Parthasarathy,et al. Fast Sparse Matrix-Vector Multiplication on GPUs for Graph Applications , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[13] Song Han,et al. EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[14] Shengen Yan,et al. yaSpMV: yet another SpMV framework on GPUs , 2014, PPoPP.
[15] Zhen Jia,et al. CVR: efficient vectorization of SpMV on x86 processors , 2018, CGO.
[16] Michal Rewienski,et al. GPU-Accelerated LOBPCG Method with Inexact Null-Space Filtering for Solving Generalized Eigenvalue Problems in Computational Electromagnetics Analysis with Higher-Order FEM , 2017 .
[17] Pradeep Dubey,et al. GraphMat: High performance graph analytics made productive , 2015, Proc. VLDB Endow..
[18] Tao Li,et al. CASpMV: A Customized and Accelerative SpMV Framework for the Sunway TaihuLight , 2021, IEEE Transactions on Parallel and Distributed Systems.
[19] Zheng Xiao,et al. hpSpMV: A Heterogeneous Parallel Computing Scheme for SpMV on the Sunway TaihuLight Supercomputer , 2019, 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS).
[20] Feng Yan,et al. Efficient PageRank and SpMV Computation on AMD GPUs , 2010, 2010 39th International Conference on Parallel Processing.
[21] Dirk Schmidl,et al. Assessing the Performance of OpenMP Programs on the Intel Xeon Phi , 2013, Euro-Par.
[22] P. Sadayappan,et al. Effective Machine Learning Based Format Selection and Performance Modeling for SpMV on GPUs , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).
[23] Ümit V. Çatalyürek,et al. Performance Evaluation of Sparse Matrix Multiplication Kernels on Intel Xeon Phi , 2013, PPAM.
[24] Tetsuya Sakurai,et al. Block Krylov-type complex moment-based eigensolvers for solving generalized eigenvalue problems , 2017, Numerical Algorithms.
[25] Jack J. Dongarra,et al. An extended set of FORTRAN basic linear algebra subprograms , 1988, TOMS.
[26] Paul H. J. Kelly,et al. GiMMiK - Generating bespoke matrix multiplication kernels for accelerators: Application to high-order Computational Fluid Dynamics , 2016, Comput. Phys. Commun..
[27] John D. Owens,et al. Gunrock , 2017, ACM Trans. Parallel Comput..
[28] Francisco Vázquez,et al. A new approach for sparse matrix vector product on NVIDIA GPUs , 2011, Concurr. Comput. Pract. Exp..
[29] Nectarios Koziris,et al. CSX: an extended compression format for spmv on shared memory systems , 2011, PPoPP '11.