论文信息 - Benchmarking multi‐GPU applications on modern multi‐GPU integrated systems - 字舞流文

Benchmarking multi‐GPU applications on modern multi‐GPU integrated systems

Massimo Bernaschi | Davide Rossetti | Elena Agostini

[1] M. Benzi. Preconditioning techniques for large linear systems: a survey , 2002 .

[2] Y. Saad,et al. Iterative solution of linear systems in the 20th century , 2000 .

[3] Massimo Bernaschi,et al. Efficient breadth first search on multi-GPU systems , 2013, J. Parallel Distributed Comput..

[4] Sreeram Potluri,et al. GPUDirect Async: Exploring GPU synchronous communication techniques for InfiniBand clusters , 2018, J. Parallel Distributed Comput..

[5] Massimo Bernaschi,et al. GPU Peer-to-Peer Techniques Applied to a Cluster Interconnect , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[6] Alexander Sergeev,et al. Horovod: fast and easy distributed deep learning in TensorFlow , 2018, ArXiv.

[7] Massimiliano Fatica,et al. Implementing the Himeno benchmark with CUDA on GPU clusters , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[8] Sayantan Sur,et al. MPI Alltoall Personalized Exchange on GPGPU Clusters: Design Alternatives and Benefit , 2011, 2011 IEEE International Conference on Cluster Computing.

[9] Massimo Bernaschi,et al. A Dynamic Pattern Factored Sparse Approximate Inverse Preconditioner on Graphics Processing Units , 2019, SIAM J. Sci. Comput..

[10] Carlo Janna,et al. Adaptive Pattern Research for Block FSAI Preconditioning , 2011, SIAM J. Sci. Comput..

[11] L. Kolotilina,et al. Factorized Sparse Approximate Inverse Preconditionings I. Theory , 1993, SIAM J. Matrix Anal. Appl..

[12] Dhabaleswar K. Panda,et al. Efficient Inter-node MPI Communication Using GPUDirect RDMA for InfiniBand Clusters with NVIDIA GPUs , 2013, 2013 42nd International Conference on Parallel Processing.

[13] Massimo Bernaschi,et al. Highly optimized simulations on single- and multi-GPU systems of the 3D Ising spin glass model , 2014, Comput. Phys. Commun..

[14] Everett H. Phillips,et al. An introduction to multi-GPU programming for physicists , 2012 .

[15] Dhabaleswar K. Panda,et al. Designing efficient small message transfer mechanism for inter-node MPI communication on InfiniBand GPU clusters , 2014, 2014 21st International Conference on High Performance Computing (HiPC).

[16] Sayantan Sur,et al. MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters , 2011, Computer Science - Research and Development.