In-Depth Analyses of Unified Virtual Memory System for GPU Accelerated Computing
暂无分享,去创建一个
[1] Olga Pearce,et al. RAJA: Portable Performance for Large-Scale Scientific Applications , 2019, 2019 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC).
[2] David A. Bader,et al. Traversing large graphs on GPUs with unified memory , 2020, Proc. VLDB Endow..
[3] Dhabaleswar K. Panda,et al. Characterizing CUDA Unified Memory (UM)-Aware MPI Designs on Modern GPU Architectures , 2019, GPGPU@ASPLOS.
[4] John Tran,et al. cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.
[5] David W. Nellans,et al. Towards high performance paged memory for GPUs , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[6] Natalie N. Beams,et al. High-Order Finite Element Method using Standard and Device-Level Batch GEMM on GPUs , 2020, 2020 IEEE/ACM 11th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA).
[7] Zhiying Wang,et al. HPE: Hierarchical Page Eviction Policy for Unified Memory in GPUs , 2020, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[8] Zhen Wang,et al. Block-Relaxation Methods for 3D Constant-Coefficient Stencils on GPUs and Multicore CPUs , 2012, ArXiv.
[9] Stefano Markidis,et al. Performance Evaluation of Advanced Features in CUDA Unified Memory , 2019, 2019 IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC).
[10] PlimptonSteve. Fast parallel algorithms for short-range molecular dynamics , 1995 .
[11] Steve Plimpton,et al. Fast parallel algorithms for short-range molecular dynamics , 1993 .
[12] Wen-mei W. Hwu,et al. EMOGI: Efficient Memory-access for Out-of-memory Graph-traversal In GPUs , 2020, Proc. VLDB Endow..
[13] Chun Chen,et al. Speeding up Nek5000 with autotuning and specialization , 2010, ICS '10.
[14] Rami Melhem,et al. Adaptive Page Migration for Irregular Data-intensive Applications under GPU Memory Oversubscription , 2020, 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[15] Marisa López-Vallejo,et al. A Performance Study of CUDA UVM versus Manual Optimizations in a Real-World Setup: Application to a Monte Carlo Wave-Particle Event-Based Interaction Model , 2016, IEEE Transactions on Parallel and Distributed Systems.
[16] Rami G. Melhem,et al. Interplay between Hardware Prefetcher and Page Eviction Policy in CPU-GPU Unified Virtual Memory , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).
[17] Paweł Czarnul,et al. Performance evaluation of Unified Memory with prefetching and oversubscription for selected parallel CUDA applications on NVIDIA Pascal and Volta GPUs , 2019, The Journal of Supercomputing.
[18] David Kaeli,et al. Griffin: Hardware-Software Support for Efficient Page Migration in Multi-GPU Systems , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[19] Jack Dongarra,et al. Evaluation and Design of FFT for Distributed Accelerated Systems , 2018 .
[20] Matt Martineau,et al. GPU-STREAM v2.0: Benchmarking the Achievable Memory Bandwidth of Many-Core Processors Across Diverse Parallel Programming Models , 2016, ISC Workshops.
[21] Ramyad Hadidi,et al. Batch-Aware Unified Memory Management in GPUs for Irregular Workloads , 2020, ASPLOS.
[22] Jack J. Dongarra,et al. High-performance conjugate-gradient benchmark: A new metric for ranking high-performance computing systems , 2016, Int. J. High Perform. Comput. Appl..
[23] Raphael Landaverde,et al. An investigation of Unified Memory Access performance in CUDA , 2014, 2014 IEEE High Performance Extreme Computing Conference (HPEC).
[24] Tamara G. Kolda,et al. An overview of the Trilinos project , 2005, TOMS.
[25] Daniel Sunderland,et al. Kokkos: Enabling manycore performance portability through polymorphic memory access patterns , 2014, J. Parallel Distributed Comput..
[26] Hui Guo,et al. Coordinated Page Prefetch and Eviction for Memory Oversubscription Management in GPUs , 2020, 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[27] Zhiying Wang,et al. A quantitative evaluation of unified memory in GPUs , 2019, The Journal of Supercomputing.
[28] Massoud Pedram,et al. FFT-based deep learning deployment in embedded systems , 2017, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[29] David Kaeli,et al. MGPU-TSM: A Multi-GPU System with Truly Shared Memory , 2020, ArXiv.
[30] Rachata Ausavarungnirun,et al. Mosaic: A GPU Memory Manager with Application-Transparent Support for Multiple Page Sizes , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[31] UVMBench: A Comprehensive Benchmark Suite for Researching Unified Virtual Memory in GPUs , 2020, ArXiv.
[32] Jack Deslippe,et al. Comparing Managed Memory and ATS with and without Prefetching on NVIDIA Volta GPUs , 2019, 2019 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS).