Benchmarking multi‐GPU applications on modern multi‐GPU integrated systems

[1]  M. Benzi Preconditioning techniques for large linear systems: a survey , 2002 .

[2]  Y. Saad,et al.  Iterative solution of linear systems in the 20th century , 2000 .

[3]  Massimo Bernaschi,et al.  Efficient breadth first search on multi-GPU systems , 2013, J. Parallel Distributed Comput..

[4]  Sreeram Potluri,et al.  GPUDirect Async: Exploring GPU synchronous communication techniques for InfiniBand clusters , 2018, J. Parallel Distributed Comput..

[5]  Massimo Bernaschi,et al.  GPU Peer-to-Peer Techniques Applied to a Cluster Interconnect , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[6]  Alexander Sergeev,et al.  Horovod: fast and easy distributed deep learning in TensorFlow , 2018, ArXiv.

[7]  Massimiliano Fatica,et al.  Implementing the Himeno benchmark with CUDA on GPU clusters , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[8]  Sayantan Sur,et al.  MPI Alltoall Personalized Exchange on GPGPU Clusters: Design Alternatives and Benefit , 2011, 2011 IEEE International Conference on Cluster Computing.

[9]  Massimo Bernaschi,et al.  A Dynamic Pattern Factored Sparse Approximate Inverse Preconditioner on Graphics Processing Units , 2019, SIAM J. Sci. Comput..

[10]  Carlo Janna,et al.  Adaptive Pattern Research for Block FSAI Preconditioning , 2011, SIAM J. Sci. Comput..

[11]  L. Kolotilina,et al.  Factorized Sparse Approximate Inverse Preconditionings I. Theory , 1993, SIAM J. Matrix Anal. Appl..

[12]  Dhabaleswar K. Panda,et al.  Efficient Inter-node MPI Communication Using GPUDirect RDMA for InfiniBand Clusters with NVIDIA GPUs , 2013, 2013 42nd International Conference on Parallel Processing.

[13]  Massimo Bernaschi,et al.  Highly optimized simulations on single- and multi-GPU systems of the 3D Ising spin glass model , 2014, Comput. Phys. Commun..

[14]  Everett H. Phillips,et al.  An introduction to multi-GPU programming for physicists , 2012 .

[15]  Dhabaleswar K. Panda,et al.  Designing efficient small message transfer mechanism for inter-node MPI communication on InfiniBand GPU clusters , 2014, 2014 21st International Conference on High Performance Computing (HiPC).

[16]  Sayantan Sur,et al.  MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters , 2011, Computer Science - Research and Development.