Characterizing CUDA Unified Memory (UM)-Aware MPI Designs on Modern GPU Architectures
暂无分享,去创建一个
Dhabaleswar K. Panda | Hari Subramoni | Ching-Hsiang Chu | Karthik Vadambacheri Manian | A. A. Ammar | Amit Ruhela
[1] Dhabaleswar K. Panda,et al. GPU-Aware MPI on RDMA-Enabled Clusters: Design, Implementation and Evaluation , 2014, IEEE Transactions on Parallel and Distributed Systems.
[2] Chao Liu,et al. Using High Level GPU Tasks to Explore Memory and Communications Options on Heterogeneous Platforms , 2017, SEM4HPC@HPDC.
[3] George Bosilca,et al. Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation , 2004, PVM/MPI.
[4] Dhabaleswar K. Panda,et al. Designing high performance communication runtime for GPU managed memory: early experiences , 2016, GPGPU@PPoPP.
[5] Raphael Landaverde,et al. An investigation of Unified Memory Access performance in CUDA , 2014, 2014 IEEE High Performance Extreme Computing Conference (HPEC).
[6] Dhabaleswar K. Panda,et al. OC-DNN: Exploiting Advanced Unified Memory Capabilities in CUDA 9 and Volta GPUs for Out-of-Core DNN Training , 2018, 2018 IEEE 25th International Conference on High Performance Computing (HiPC).
[7] Satoshi Matsuoka,et al. DRAGON: Breaking GPU Memory Capacity Limits with Direct NVM Access , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.
[8] Dhabaleswar K. Panda,et al. Optimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand Clusters: MPI or NCCL? , 2017, EuroMPI.
[9] Pawel Czarnul,et al. Performance evaluation of unified memory and dynamic parallelism for selected parallel CUDA applications , 2017, The Journal of Supercomputing.
[10] Gunter Saake,et al. Memory Management Strategies in CPU/GPU Database Systems: A Survey , 2018, BDAS.
[11] Dhabaleswar K. Panda,et al. S-Caffe: Co-designing MPI Runtimes and Caffe for Scalable Deep Learning on Modern GPU Clusters , 2017, PPoPP.
[12] Marisa López-Vallejo,et al. A Performance Study of CUDA UVM versus Manual Optimizations in a Real-World Setup: Application to a Monte Carlo Wave-Particle Event-Based Interaction Model , 2016, IEEE Transactions on Parallel and Distributed Systems.
[13] Dhabaleswar K. Panda,et al. CUDA M3: Designing Efficient CUDA Managed Memory-Aware MPI by Exploiting GDR and IPC , 2016, 2016 IEEE 23rd International Conference on High Performance Computing (HiPC).
[14] Dhabaleswar K. Panda,et al. Efficient Inter-node MPI Communication Using GPUDirect RDMA for InfiniBand Clusters with NVIDIA GPUs , 2013, 2013 42nd International Conference on Parallel Processing.
[15] Mahmoud Al-Ayyoub,et al. Accelerating Levenshtein and Damerau edit distance algorithms using GPU with unified memory , 2017, 2017 8th International Conference on Information and Communication Systems (ICICS).
[16] Gianluca Francini,et al. GPU-only unified ConvMM layer for neural classifiers , 2017, 2017 4th International Conference on Control, Decision and Information Technologies (CoDIT).
[17] Joshua A. Anderson,et al. General purpose molecular dynamics simulations fully implemented on graphics processing units , 2008, J. Comput. Phys..
[18] Hal Finkel,et al. Benchmarking and Evaluating Unified Memory for OpenMP GPU Offloading , 2017, LLVM-HPC@SC.
[19] Peng Wang,et al. High-Frequency Nonlinear Earthquake Simulations on Petascale Heterogeneous Supercomputers , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.
[20] David R. Kaeli,et al. UMH , 2016, ACM Trans. Archit. Code Optim..
[21] Alexandra Fedorova,et al. Analyzing memory management methods on integrated CPU-GPU systems , 2017, ISMM.
[22] Dhabaleswar K. Panda,et al. Optimizing MPI Communication on Multi-GPU Systems Using CUDA Inter-Process Communication , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.
[23] Simon See,et al. An Evaluation of Unified Memory Technology on NVIDIA GPUs , 2015, 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.
[24] Danilo Medeiros Eler,et al. Performance Evaluation of Data Migration Methods Between the Host and the Device in CUDA-Based Programming , 2016 .