On the limits of GPU acceleration
暂无分享,去创建一个
Murat Efe Guney | Richard Vuduc | Aparna Chandramowlishwaran | Jee Choi | Aashay Shringarpure | R. Vuduc | Aparna Chandramowlishwaran | JeeWhan Choi | M. Guney | A. Shringarpure
[1] Leslie Greengard,et al. A fast algorithm for particle simulations , 1987 .
[2] James Demmel,et al. Applied Numerical Linear Algebra , 1997 .
[3] Lexing Ying,et al. A New Parallel Kernel-Independent Fast Multipole Method , 2003, ACM/IEEE SC 2003 Conference (SC'03).
[4] D. Zorin,et al. A kernel-independent adaptive fast multipole algorithm in two and three dimensions , 2004 .
[5] Richard W. Vuduc,et al. Sparsity: Optimization Framework for Sparse Matrix Kernels , 2004, Int. J. High Perform. Comput. Appl..
[6] Katherine Yelick,et al. OSKI: A library of automatically tuned sparse matrix kernels , 2005 .
[7] Ramani Duraiswami,et al. Fast multipole methods on graphics processors , 2008, J. Comput. Phys..
[8] David Patterson,et al. The Top 10 Innovations in the New NVIDIA Fermi Architecture, and the Top 3 Next Challenges , 2009 .
[9] Richard W. Vuduc,et al. Direct N-body Kernels for Multicore Platforms , 2009, 2009 International Conference on Parallel Processing.
[10] Michael Garland,et al. Implementing sparse matrix-vector multiplication on throughput-oriented processors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[11] Samuel Williams,et al. Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2009, Parallel Comput..
[12] Richard W. Vuduc,et al. A massively parallel adaptive fast-multipole method on heterogeneous architectures , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[13] Samuel Williams,et al. Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[14] Richard W. Vuduc,et al. Model-driven autotuning of sparse matrix-vector multiply on GPUs , 2010, PPoPP '10.
[15] Samuel Williams,et al. Sparse Matrix-Vector Multiplication on Multicore and Accelerators , 2010 .
[16] Murat Efe Guney,et al. High-performance direct solution of finite element problems on multi-core processors , 2010 .
[17] Eric Darve,et al. Assembly of finite element methods on graphics processors , 2011 .