Semiautomatic Acceleration of Sparse Matrix-Vector Product Using OpenACC
暂无分享,去创建一个
[1] Hermann J. Eberl,et al. OpenACC Parallelisation for Diffusion Problems, Applied to Temperature Distribution on a Honeycomb Around the Bee Brood: A Worked Example Using BiCGSTAB , 2013, PPAM.
[2] Hiroshi Okuda,et al. Effect of GPU Communication-Hiding for SPMV Using OpenACC , 2014 .
[3] Samuel Williams,et al. Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2009, Parallel Comput..
[4] Gerhard Wellein,et al. Sparse Matrix-vector Multiplication on GPGPU Clusters: A New Storage Format and a Scalable Implementation , 2011, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.
[5] Ester M. Garzón,et al. Improving the Performance of the Sparse Matrix Vector Product with GPUs , 2010, 2010 10th IEEE International Conference on Computer and Information Technology.
[6] Kiran Kumar Matam,et al. Accelerating Sparse Matrix Vector Multiplication in Iterative Methods Using GPU , 2011, 2011 International Conference on Parallel Processing.
[7] Michael Wolfe,et al. Implementing the PGI Accelerator model , 2010, GPGPU-3.
[8] Rohit Chandra,et al. Parallel programming in openMP , 2000 .
[9] Timothy A. Davis,et al. The university of Florida sparse matrix collection , 2011, TOMS.
[10] Francisco de Sande,et al. A preliminary evaluation of OpenACC implementations , 2012, The Journal of Supercomputing.
[11] Ami Marowka. Parallel computing on any desktop , 2007, CACM.
[12] Kevin Skadron,et al. Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).
[13] Yousef Saad,et al. GPU-accelerated preconditioned iterative linear solvers , 2013, The Journal of Supercomputing.
[14] Cheng Wang,et al. A Validation Testsuite for OpenACC 1.0 , 2014, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.
[15] Tomás F. Pena,et al. Using sampled information: is it enough for the sparse matrix–vector product locality optimization? , 2014, Concurr. Comput. Pract. Exp..
[16] Hai Jin,et al. A segment‐based sparse matrix–vector multiplication on CUDA , 2014, Concurr. Comput. Pract. Exp..
[17] Jack Dongarra,et al. Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects , 2009 .
[18] Beata Bylina,et al. Performance analysis of multicore and multinodal implementation of SpMV operation , 2014, 2014 Federated Conference on Computer Science and Information Systems.
[19] Jonas Koko,et al. Parallel preconditioned conjugate gradient algorithm on GPU , 2012, J. Comput. Appl. Math..
[20] Ken A. Hawick,et al. Exploiting graphical processing units for data-parallel scientific applications , 2009 .
[21] Janusz S. Kowalik,et al. Using OpenCL - Programming Massively Parallel Computers , 2012, Advances in Parallel Computing.
[22] Sunita Chandrasekaran,et al. Exploring Programming Multi-GPUs Using OpenMP and OpenACC-Based Hybrid Model , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.
[23] Putt Sakdhnagool,et al. Evaluating Performance Portability of OpenACC , 2014, LCPC.
[24] Jack J. Dongarra,et al. Accelerating GPU Kernels for Dense Linear Algebra , 2010, VECPAR.
[25] Richard F. Barrett,et al. Matrix Market: a web resource for test matrix collections , 1996, Quality of Numerical Software.
[26] Yousef Saad,et al. Iterative methods for sparse linear systems , 2003 .