Benchmarking multi‐GPU applications on modern multi‐GPU integrated systems
暂无分享,去创建一个
[1] M. Benzi. Preconditioning techniques for large linear systems: a survey , 2002 .
[2] Y. Saad,et al. Iterative solution of linear systems in the 20th century , 2000 .
[3] Massimo Bernaschi,et al. Efficient breadth first search on multi-GPU systems , 2013, J. Parallel Distributed Comput..
[4] Sreeram Potluri,et al. GPUDirect Async: Exploring GPU synchronous communication techniques for InfiniBand clusters , 2018, J. Parallel Distributed Comput..
[5] Massimo Bernaschi,et al. GPU Peer-to-Peer Techniques Applied to a Cluster Interconnect , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.
[6] Alexander Sergeev,et al. Horovod: fast and easy distributed deep learning in TensorFlow , 2018, ArXiv.
[7] Massimiliano Fatica,et al. Implementing the Himeno benchmark with CUDA on GPU clusters , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[8] Sayantan Sur,et al. MPI Alltoall Personalized Exchange on GPGPU Clusters: Design Alternatives and Benefit , 2011, 2011 IEEE International Conference on Cluster Computing.
[9] Massimo Bernaschi,et al. A Dynamic Pattern Factored Sparse Approximate Inverse Preconditioner on Graphics Processing Units , 2019, SIAM J. Sci. Comput..
[10] Carlo Janna,et al. Adaptive Pattern Research for Block FSAI Preconditioning , 2011, SIAM J. Sci. Comput..
[11] L. Kolotilina,et al. Factorized Sparse Approximate Inverse Preconditionings I. Theory , 1993, SIAM J. Matrix Anal. Appl..
[12] Dhabaleswar K. Panda,et al. Efficient Inter-node MPI Communication Using GPUDirect RDMA for InfiniBand Clusters with NVIDIA GPUs , 2013, 2013 42nd International Conference on Parallel Processing.
[13] Massimo Bernaschi,et al. Highly optimized simulations on single- and multi-GPU systems of the 3D Ising spin glass model , 2014, Comput. Phys. Commun..
[14] Everett H. Phillips,et al. An introduction to multi-GPU programming for physicists , 2012 .
[15] Dhabaleswar K. Panda,et al. Designing efficient small message transfer mechanism for inter-node MPI communication on InfiniBand GPU clusters , 2014, 2014 21st International Conference on High Performance Computing (HiPC).
[16] Sayantan Sur,et al. MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters , 2011, Computer Science - Research and Development.