OMB-UM: Design, Implementation, and Evaluation of CUDA Unified Memory Aware MPI Benchmarks
暂无分享,去创建一个
Hari Subramoni | Ammar Ahmad Awan | Ching-Hsiang Chu | Karthik Vadambacheri Manian | Kawthar Shafie Khorassani
[1] Dhabaleswar K. Panda,et al. Scalable Distributed DNN Training using TensorFlow and CUDA-Aware MPI: Characterization, Designs, and Performance Evaluation , 2018, 2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID).
[2] Dhabaleswar K. Panda,et al. CUDA M3: Designing Efficient CUDA Managed Memory-Aware MPI by Exploiting GDR and IPC , 2016, 2016 IEEE 23rd International Conference on High Performance Computing (HiPC).
[3] Dhabaleswar K. Panda,et al. S-Caffe: Co-designing MPI Runtimes and Caffe for Scalable Deep Learning on Modern GPU Clusters , 2017, PPoPP.
[4] Joshua A. Anderson,et al. General purpose molecular dynamics simulations fully implemented on graphics processing units , 2008, J. Comput. Phys..
[5] Dhabaleswar K. Panda,et al. Designing high performance communication runtime for GPU managed memory: early experiences , 2016, GPGPU@PPoPP.
[6] Hal Finkel,et al. Benchmarking and Evaluating Unified Memory for OpenMP GPU Offloading , 2017, LLVM-HPC@SC.
[7] Dhabaleswar K. Panda,et al. GPU-Aware MPI on RDMA-Enabled Clusters: Design, Implementation and Evaluation , 2014, IEEE Transactions on Parallel and Distributed Systems.
[8] Chao Liu,et al. Using High Level GPU Tasks to Explore Memory and Communications Options on Heterogeneous Platforms , 2017, SEM4HPC@HPDC.
[9] Dhabaleswar K. Panda,et al. OMB-GPU: A Micro-Benchmark Suite for Evaluating MPI Libraries on GPU Clusters , 2012, EuroMPI.
[10] Paweł Czarnul,et al. Performance evaluation of Unified Memory with prefetching and oversubscription for selected parallel CUDA applications on NVIDIA Pascal and Volta GPUs , 2019, The Journal of Supercomputing.
[11] Dhabaleswar K. Panda,et al. Designing efficient small message transfer mechanism for inter-node MPI communication on InfiniBand GPU clusters , 2014, 2014 21st International Conference on High Performance Computing (HiPC).
[12] George Bosilca,et al. Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation , 2004, PVM/MPI.
[13] Dhabaleswar K. Panda,et al. Performance Evaluation of MPI Libraries on GPU-Enabled OpenPOWER Architectures: Early Experiences , 2019, ISC Workshops.
[14] David I. August,et al. Automatic CPU-GPU communication management and optimization , 2011, PLDI '11.
[15] Dhabaleswar K. Panda,et al. Characterizing CUDA Unified Memory (UM)-Aware MPI Designs on Modern GPU Architectures , 2019, GPGPU@ASPLOS.