An Adaptive Framework for Oversubscription Management in CPU-GPU Unified Memory

Hardware support for fault-driven page migration and on-demand memory allocation along with the advancements in unified memory runtime in modern graphics processing units (GPUs) simplify the memory management in discrete CPU-GPU heterogeneous memory systems and ensure higher programmability. GPUs adopt to accelerate general purpose applications as they are now an integral part of heterogeneous computing platforms ranging from supercomputers to commodity cloud platforms. However, data-intensive applications face the challenge of device-memory oversubscription as the limited capacity of bandwidth-optimized GPU memory fails to accommodate their increasing working sets. Performance overhead under memory oversubscription comes from the thrashing of memory pages over slow CPU-GPU interconnect. Depending on the diverse computing and memory access pattern, each application demands special attention from memory management. As a result, the responsibility of effectively utilizing the plethora of memory management techniques supported by GPU programming libraries and runtime falls squarely on the application programmer. This paper presents a smart runtime that leverages the faults and page-migration information to detect underlying patterns in CPU-GPU interconnect traffic. Based on the online workload characterization, the extended unified memory runtime dynamically chooses and employs a suitable policy from a wide array of memory management strategies to address the issues with memory oversubscription. Experimental evaluation shows that this smart adaptive runtime provides 18% and 30% (geometric mean) performance improvement across all benchmarks compared to the default unified memory runtime under 125% and 150% device memory oversubscription, respectively.

[1]  Hui Guo,et al.  Coordinated Page Prefetch and Eviction for Memory Oversubscription Management in GPUs , 2020, 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[2]  Rami Melhem,et al.  Adaptive Page Migration for Irregular Data-intensive Applications under GPU Memory Oversubscription , 2020, 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[3]  Jun Yang,et al.  A Framework for Memory Oversubscription Management in Graphics Processing Units , 2019, ASPLOS.

[4]  Kevin Skadron,et al.  Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[5]  Henry Wong,et al.  Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[6]  Keshav Pingali,et al.  A quantitative study of irregular programs on GPUs , 2012, 2012 IEEE International Symposium on Workload Characterization (IISWC).

[7]  Lifan Xu,et al.  Auto-tuning a high-level language targeted to GPU codes , 2012, 2012 Innovative Parallel Computing (InPar).

[8]  David W. Nellans,et al.  Towards high performance paged memory for GPUs , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[9]  Thomas F. Wenisch,et al.  Unlocking bandwidth for GPUs in CC-NUMA systems , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[10]  Rami G. Melhem,et al.  Interplay between Hardware Prefetcher and Page Eviction Policy in CPU-GPU Unified Virtual Memory , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).

[11]  Zhiying Wang,et al.  HPE: Hierarchical Page Eviction Policy for Unified Memory in GPUs , 2020, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[12]  Stephen W. Keckler,et al.  Page Placement Strategies for GPUs within Heterogeneous Memory Systems , 2015, ASPLOS.