论文信息 - GDM: device memory management for gpgpu computing

GDM: device memory management for gpgpu computing

GPGPUs are evolving from dedicated accelerators towards mainstream commodity computing resources. During the transition, the lack of system management of device memory space on GPGPUs has become a major hurdle. In existing GPGPU systems, device memory space is still managed explicitly by individual applications, which not only increases the burden of programmers but can also cause application crashes, hangs, or low performance. In this paper, we present the design and implementation of GDM, a fully functional GPGPU device memory manager to address the above problems and unleash the computing power of GPGPUs in general-purpose environments. To effectively coordinate the device memory usage of different applications, GDM takes control over device memory allocations and data transfers to and from device memory, leveraging a buffer allocated in each application's virtual memory. GDM utilizes the unique features of GPGPU systems and relies on several effective optimization techniques to guarantee the efficient usage of device memory space and to achieve high performance. We have evaluated GDM and compared it against state-of-the-art GPGPU system software on a range of workloads. The results show that GDM can prevent applications from crashes, including those induced by device memory leaks, and improve system performance by up to 43%.

Shinpei Kato | Xiaoning Ding | Xiaodong Zhang | Rubao Lee | Kaibo Wang

[1] Peter J. Denning,et al. Third Generation Computer Systems , 1971, CSUR.

[2] Vanish Talwar,et al. Pegasus: Coordinated Scheduling for Virtualized Accelerator-based Systems , 2011, USENIX Annual Technical Conference.

[3] John E. Stone,et al. An asymmetric distributed shared memory model for heterogeneous parallel systems , 2010, ASPLOS XV.

[4] Aaftab Munshi,et al. The OpenCL specification , 2009, 2009 IEEE Hot Chips 21 Symposium (HCS).

[5] Feng Ji,et al. RSVM: A Region-based Software Virtual Memory for GPU , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.

[6] Hyesoon Kim. Supporting virtual memory in GPGPU without supporting precise exceptions , 2012, MSPC '12.

[7] Jack J. Purdum,et al. C programming guide , 1983 .

[8] Tom R. Halfhill. NVIDIA's Next-Generation CUDA Compute and Graphics Architecture, Code-Named Fermi, Adds Muscle for Parallel Processing , 2009 .

[9] David I. August,et al. Automatic CPU-GPU communication management and optimization , 2011, PLDI '11.

[10] Joel H. Saltz,et al. Accelerating Pathology Image Data Cross-Comparison on CPU-GPU Hybrid Systems , 2012, Proc. VLDB Endow..

[11] Karthikeyan Sankaralingam,et al. iGPU: Exception support and speculative execution on GPUs , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[12] Jingren Zhou,et al. SCOPE: easy and efficient parallel processing of massive data sets , 2008, Proc. VLDB Endow..

[13] A. Snavely,et al. Symbiotic jobscheduling for a simultaneous mutlithreading processor , 2000, SIGP.

[14] Seungyeop Han,et al. SSLShader: Cheap SSL Acceleration with Commodity Processors , 2011, NSDI.

[15] Yuan Yuan,et al. The Yin and Yang of Processing Data Warehousing Queries on GPU Devices , 2013, Proc. VLDB Endow..

[16] John Poulton. An embedded DRAM for CMOS ASICs , 1997, Proceedings Seventeenth Conference on Advanced Research in VLSI.

[17] Idit Keidar,et al. GPUfs: Integrating a file system with GPUs , 2013, TOCS.

[18] Mark Silberstein,et al. PTask: operating system abstractions to manage GPUs as compute devices , 2011, SOSP.

[19] Shinpei Kato,et al. TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments , 2011, USENIX Annual Technical Conference.

[20] Lei Jiang,et al. Die Stacking (3D) Microarchitecture , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[21] Michael R. Macedonia,et al. The GPU Enters Computing's Mainstream , 2003, Computer.

[22] Pradeep Dubey,et al. Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU , 2010, ISCA.

[23] Peter J. Denning,et al. Virtual memory , 1970, CSUR.

[24] Dawson R. Engler,et al. Exterminate all operating system abstractions , 1995, Proceedings 5th Workshop on Hot Topics in Operating Systems (HotOS-V).

[25] Shinpei Kato,et al. Gdev: First-Class GPU Resource Management in the Operating System , 2012, USENIX Annual Technical Conference.

[26] Erik Lindholm,et al. NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.

[27] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[28] Razvan Pascanu,et al. Theano: A CPU and GPU Math Compiler in Python , 2010, SciPy.