Interplay between Hardware Prefetcher and Page Eviction Policy in CPU-GPU Unified Virtual Memory
暂无分享,去创建一个
Rami G. Melhem | Jun Yang | Debashis Ganguly | Ziyu Zhang | Jun Yang | R. Melhem | D. Ganguly | Ziyu Zhang
[1] Rachata Ausavarungnirun,et al. Mosaic: A GPU Memory Manager with Application-Transparent Support for Multiple Page Sizes , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[2] Michael M. Swift,et al. Efficient Memory Virtualization: Reducing Dimensionality of Nested Page Walks , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[3] Mahmut T. Kandemir,et al. Orchestrated scheduling and prefetching for GPGPUs , 2013, ISCA.
[4] AngryCalc. NVIDIA GeForce GTX 1050 Ti , 2018 .
[5] Frank Bellosa,et al. GPUswap: Enabling Oversubscription of GPU Memory through Transparent Swapping , 2015, VEE.
[6] Mahmut T. Kandemir,et al. OWL: cooperative thread array aware scheduling techniques for improving GPGPU performance , 2013, ASPLOS '13.
[7] A. Azzouz. 2011 , 2020, City.
[8] S. M. García,et al. 2014: , 2020, A Party for Lazarus.
[9] Scott A. Mahlke,et al. VAST: The illusion of a large memory space for GPUs , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).
[10] Wen-mei W. Hwu,et al. Run-time spatial locality detection and optimization , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.
[11] Lifan Xu,et al. Auto-tuning a high-level language targeted to GPU codes , 2012, 2012 Innovative Parallel Computing (InPar).
[12] Stephen W. Keckler,et al. Page Placement Strategies for GPUs within Heterogeneous Memory Systems , 2015, ASPLOS.
[13] David W. Nellans,et al. Towards high performance paged memory for GPUs , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[14] Yan Solihin,et al. Scheduling Page Table Walks for Irregular GPU Applications , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[15] Henry Wong,et al. Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[16] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[17] Margaret Martonosi,et al. Shared last-level TLBs for chip multiprocessors , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.
[18] Jun Yang,et al. A Framework for Memory Oversubscription Management in Graphics Processing Units , 2019, ASPLOS.
[19] Hai Jin,et al. Hotplug or Ballooning: A Comparative Study on Dynamic Memory Management Techniques for Virtual Machines , 2015, IEEE Transactions on Parallel and Distributed Systems.
[20] Abhishek Bhattacharjee,et al. Architectural support for address translation on GPUs: designing memory management units for CPU/GPUs with unified address spaces , 2014, ASPLOS.
[21] Jason Cong,et al. Supporting Address Translation for Accelerator-Centric Architectures , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).