Autotuning Sparse Matrix-Vector Multiplication for Multicore
暂无分享,去创建一个
Katherine Yelick | James Demmel | Jong-Ho Byun | Richard Lin | J. Demmel | K. Yelick | Jong-Ho Byun | Richard Lin
[1] Richard Vuduc,et al. Automatic performance tuning of sparse matrix kernels , 2003 .
[2] Jack J. Dongarra,et al. Automated empirical optimizations of software and the ATLAS project , 2001, Parallel Comput..
[3] Liqiang Wang,et al. Auto-Tuning CUDA Parameters for Sparse Matrix-Vector Multiplication on GPUs , 2010, 2010 International Conference on Computational and Information Sciences.
[4] Michael M. Wolf,et al. Optimizing Parallel Sparse Matrix-Vector Multiplication by Corner Partitioning , 2008 .
[5] A. Usman,et al. Review of Storage Techniques for Sparse Matrices , 2005, 2005 Pakistan Section Multitopic Conference.
[6] Bruce Hendrickson,et al. Optimizing parallel sparse matrix-vector multiplication by partitioning. , 2008 .
[7] Michael Garland,et al. Implementing sparse matrix-vector multiplication on throughput-oriented processors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[8] Olav Aanes Fagerlund. Multi-core programming with OpenCL: performance and portability: OpenCL in a memory bound scenario , 2010 .
[9] Bora Uçar,et al. On Two-Dimensional Sparse Matrix Partitioning: Models, Methods, and a Recipe , 2010, SIAM J. Sci. Comput..
[10] Eduardo F. D'Azevedo,et al. Vectorized Sparse Matrix Multiply for Compressed Row Storage Format , 2005, International Conference on Computational Science.
[11] Timothy A. Davis,et al. The university of Florida sparse matrix collection , 2011, TOMS.
[12] Jesús Carretero,et al. Reordering Algorithms for Increasing Locality on Multicore Processors , 2008, 2008 10th IEEE International Conference on High Performance Computing and Communications.
[13] Katherine A. Yelick,et al. Optimizing Sparse Matrix Computations for Register Reuse in SPARSITY , 2001, International Conference on Computational Science.
[14] James Demmel,et al. Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology , 1997, ICS '97.
[15] Samuel Williams,et al. A Generalized Framework for Auto-tuning Stencil Computations , 2009 .
[16] Stamatis Vassiliadis,et al. A Hierarchical sparse matrix storage format for vector processors , 2003, Proceedings International Parallel and Distributed Processing Symposium.
[17] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.
[18] Katherine Yelick,et al. OSKI: A library of automatically tuned sparse matrix kernels , 2005 .
[19] Yuefan Deng,et al. New trends in high performance computing , 2001, Parallel Computing.
[20] John M. Mellor-Crummey,et al. Optimizing Sparse Matrix–Vector Product Computations Using Unroll and Jam , 2004, Int. J. High Perform. Comput. Appl..
[21] Nectarios Koziris,et al. A Comparative Study of Blocking Storage Methods for Sparse Matrices on Multicore Architectures , 2009, 2009 International Conference on Computational Science and Engineering.
[22] Richard W. Vuduc,et al. Sparsity: Optimization Framework for Sparse Matrix Kernels , 2004, Int. J. High Perform. Comput. Appl..
[23] John R. Gilbert,et al. Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks , 2009, SPAA '09.
[24] Samuel Williams,et al. Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2009, Parallel Comput..
[25] Samuel Williams,et al. Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).
[26] Patrick R. Amestoy,et al. An Approximate Minimum Degree Ordering Algorithm , 1996, SIAM J. Matrix Anal. Appl..
[27] Ian P. King,et al. An automatic reordering scheme for simultaneous equations derived from network systems , 1970 .
[28] Ümit V. Çatalyürek,et al. Decomposing Irregularly Sparse Matrices for Parallel Matrix-Vector Multiplication , 1996, IRREGULAR.
[29] Matteo Frigo,et al. A fast Fourier transform compiler , 1999, SIGP.
[30] Samuel Williams,et al. Auto-tuning performance on multicore computers , 2008 .
[31] Michael Garland,et al. Efficient Sparse Matrix-Vector Multiplication on CUDA , 2008 .
[32] Eun Im,et al. Optimizing the Performance of Sparse Matrix-Vector Multiplication , 2000 .
[33] E. Cuthill,et al. Reducing the bandwidth of sparse symmetric matrices , 1969, ACM '69.
[34] E. Ng,et al. An E cient Algorithm to Compute Row andColumn Counts for Sparse Cholesky Factorization , 1994 .
[35] P. Sadayappan,et al. On improving the performance of sparse matrix-vector multiplication , 1997, Proceedings Fourth International Conference on High-Performance Computing.