Computing the sparse matrix vector product using block-based kernels without zero padding on processors with AVX-512 instructions
暂无分享,去创建一个
[1] Youcef Saad,et al. A Basic Tool Kit for Sparse Matrix Computations , 1990 .
[2] Sivan Toledo,et al. Improving the memory-system performance of sparse-matrix vector multiplication , 1997, IBM J. Res. Dev..
[3] Fan Ye,et al. A Study of SpMV Implementation Using MPI and OpenMP on Intel Many-Core Architecture , 2014, VECPAR.
[4] Katherine Yelick,et al. OSKI: A library of automatically tuned sparse matrix kernels , 2005 .
[5] Samuel Williams,et al. Reduced-Bandwidth Multithreaded Algorithms for Sparse Matrix-Vector Multiplication , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[6] A. N. Yzelman. Generalised vectorisation for sparse matrix: vector multiplication , 2015, IA3@SC.
[7] Richard Vuduc,et al. Automatic performance tuning of sparse matrix kernels , 2003 .
[8] Richard Barrett,et al. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods , 1994, Other Titles in Applied Mathematics.
[9] Alston S. Householder,et al. Handbook for Automatic Computation , 1960, Comput. J..
[10] E. Cuthill,et al. Reducing the bandwidth of sparse symmetric matrices , 1969, ACM '69.
[11] P. Sadayappan,et al. On improving the performance of sparse matrix-vector multiplication , 1997, Proceedings Fourth International Conference on High-Performance Computing.
[12] Bérenger Bramas,et al. Optimization and parallelization of the boundary element method for the wave equation in time domain. (Optimisation et parallèlisation de la méthode des élements frontières pour l'équation des ondes dans le domaine temporel) , 2016 .
[13] Xing Liu,et al. Efficient sparse matrix-vector multiplication on x86-based many-core processors , 2013, ICS '13.
[14] Timothy A. Davis,et al. The university of Florida sparse matrix collection , 2011, TOMS.
[15] Yousef Saad,et al. Iterative methods for sparse linear systems , 2003 .
[16] Eun Im,et al. Optimizing the Performance of Sparse Matrix-Vector Multiplication , 2000 .
[17] Brian Vinter,et al. CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication , 2015, ICS.
[18] Hyun Jin Moon,et al. Fast Sparse Matrix-Vector Multiplication by Exploiting Variable Block Structure , 2005, HPCC.
[19] A. Pinar,et al. Improving Performance of Sparse Matrix-Vector Multiplication , 1999, ACM/IEEE SC 1999 Conference (SC'99).
[20] Francisco F. Rivera,et al. Performance optimization of irregular codes based on the combination of reordering and blocking techniques , 2005, Parallel Comput..
[21] Gerhard Wellein,et al. A Unified Sparse Matrix Data Format for Efficient General Sparse Matrix-Vector Multiplication on Modern Processors with Wide SIMD Units , 2013, SIAM J. Sci. Comput..
[22] Katherine A. Yelick,et al. Optimizing Sparse Matrix Computations for Register Reuse in SPARSITY , 2001, International Conference on Computational Science.
[23] Ramaseshan Kannan. Efficient sparse matrix multiple-vector multiplication using a bitmapped format , 2013, 20th Annual International Conference on High Performance Computing.
[24] Richard W. Vuduc,et al. Sparsity: Optimization Framework for Sparse Matrix Kernels , 2004, Int. J. High Perform. Comput. Appl..