Reducing Communication Overhead in Multi-GPU Hybrid Solver for 2D Laplace’s Equation
暂无分享,去创建一个
[1] Jie Cheng,et al. CUDA by Example: An Introduction to General-Purpose GPU Programming , 2010, Scalable Comput. Pract. Exp..
[2] Roman Wyrzykowski,et al. Parallel Implementation of Conjugate Gradient Method on Graphics Processors , 2009, PPAM.
[3] Chao-Tung Yang,et al. Hybrid CUDA, OpenMP, and MPI parallel programming on multicore GPU clusters , 2011, Comput. Phys. Commun..
[4] James Demmel,et al. Applied Numerical Linear Algebra , 1997 .
[5] Torsten Hoefler,et al. Optimizing a conjugate gradient solver with non-blocking collective operations , 2007, Parallel Comput..
[6] Greg Humphreys,et al. A multigrid solver for boundary value problems using programmable graphics hardware , 2003, HWWS '03.
[7] Jie Cheng,et al. Programming Massively Parallel Processors. A Hands-on Approach , 2010, Scalable Comput. Pract. Exp..
[8] William L. Briggs,et al. A multigrid tutorial, Second Edition , 2000 .
[9] Orion S. Lawlor,et al. Message passing for GPGPU clusters: CudaMPI , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.
[10] Paulius Micikevicius,et al. 3D finite difference computation on GPUs using CUDA , 2009, GPGPU-2.
[11] Robert Strzodka,et al. Cyclic Reduction Tridiagonal Solvers on GPUs Applied to Mixed-Precision Multigrid , 2011, IEEE Transactions on Parallel and Distributed Systems.
[12] Satoshi Matsuoka,et al. Fast Conjugate Gradients with Multiple GPUs , 2009, ICCS.
[13] Rajeev Thakur,et al. Test suite for evaluating performance of multithreaded MPI communication , 2009, Parallel Comput..
[14] William L. Briggs,et al. A multigrid tutorial , 1987 .
[15] Jens H. Krüger,et al. A Survey of General‐Purpose Computation on Graphics Hardware , 2007, Eurographics.
[16] David E. Bernholdt,et al. A framework for characterizing overlap of communication and computation in parallel applications , 2008, Cluster Computing.
[17] Michal Czapinski,et al. An effective Parallel Multistart Tabu Search for Quadratic Assignment Problem on CUDA platform , 2013, J. Parallel Distributed Comput..
[18] Yao Zhang,et al. Fast tridiagonal solvers on the GPU , 2010, PPoPP '10.
[19] Peng Li,et al. Multigrid on GPU: tackling power grid analysis on parallel SIMT platforms , 2008, ICCAD 2008.
[20] Jack J. Dongarra,et al. Overlapping Computation and Communication for Advection on Hybrid Parallel Computers , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[21] Yao Zhang,et al. Parallel Computing Experiences with CUDA , 2008, IEEE Micro.
[22] Michal Czapinski,et al. Tabu Search with two approaches to parallel flowshop evaluation on CUDA platform , 2011, J. Parallel Distributed Comput..
[23] Keith D. Underwood,et al. Analyzing the Impact of Overlap, Offload, and Independent Progress for Message Passing Interface Applications , 2005, Int. J. High Perform. Comput. Appl..
[24] Eric Darve,et al. Large calculation of the flow over a hypersonic vehicle using a GPU , 2008, J. Comput. Phys..
[25] Andreas Koch,et al. A Fast GPU Implementation for Solving Sparse Ill-Posed Linear Equation Systems , 2009, PPAM.
[26] Erik Lindholm,et al. NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.
[27] S. McCormick,et al. A multigrid tutorial (2nd ed.) , 2000 .
[28] Kevin Skadron,et al. Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).
[29] Matemática,et al. Society for Industrial and Applied Mathematics , 2010 .
[30] Jack J. Dongarra,et al. Towards dense linear algebra for hybrid GPU accelerated manycore systems , 2009, Parallel Comput..