论文信息 - RGEM: A Responsive GPGPU Execution Model for Runtime Engines

RGEM: A Responsive GPGPU Execution Model for Runtime Engines

General-purpose computing on graphics processing units, also known as GPGPU, is a burgeoning technique to enhance the computation of parallel programs. Applying this technique to real-time applications, however, requires additional support for timeliness of execution. In particular, the non-preemptive nature of GPGPU, associated with copying data to/from the device memory and launching code onto the device, needs to be managed in a timely manner. In this paper, we present a responsive GPGPU execution model (RGEM), which is a user-space runtime solution to protect the response times of high-priority GPGPU tasks from competing workload. RGEM splits a memory-copy transaction into multiple chunks so that preemption points appear at chunk boundaries. It also ensures that only the highest-priority GPGPU task launches code onto the device at any given time, to avoid performance interference caused by concurrent launches. A prototype implementation of an RGEM-based CUDA runtime engine is provided to evaluate the real-world impact of RGEM. Our experiments demonstrate that the response times of high-priority GPGPU tasks can be protected under RGEM, whereas their response times increase in an unbounded fashion without RGEM support, as the data sizes of competing workload increase.

[1] Shinpei Kato,et al. Operating Systems Challenges for GPU Resource Management , 2011 .

[2] Kevin Skadron,et al. Enabling Task Parallelism in the CUDA Scheduler , 2009 .

[3] Pradeep Dubey,et al. FAST: fast architecture sensitive tree search on modern CPUs and GPUs , 2010, SIGMOD Conference.

[4] Ramesh Govindan,et al. Scheduling and IPC mechanisms for continuous media , 1991, SOSP '91.

[5] Eyal de Lara,et al. VMM-independent graphics acceleration , 2007, VEE '07.

[6] Hideyuki Tokuda,et al. Efficient timing management for user-level real-time threads , 1995, Proceedings Real-Time Technology and Applications Symposium.

[7] Shinpei Kato,et al. TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments , 2011, USENIX Annual Technical Conference.

[8] Shinpei Kato,et al. Resource Sharing in GPU-Accelerated Windowing Systems , 2011, 2011 17th IEEE Real-Time and Embedded Technology and Applications Symposium.

[9] Hans Werner Meuer,et al. Top500 Supercomputer Sites , 1997 .

[10] Vanish Talwar,et al. GViM: GPU-accelerated virtual machines , 2009, HPCVirt '09.

[11] Long Chen,et al. Dynamic load balancing on single- and multi-GPU systems , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[12] Alan Burns,et al. Applying new scheduling theory to static priority pre-emptive scheduling , 1993, Softw. Eng. J..

[13] Vanish Talwar,et al. Pegasus: Coordinated Scheduling for Virtualized Accelerator-based Systems , 2011, USENIX Annual Technical Conference.

[14] Lui Sha,et al. The real-time publisher/subscriber inter-process communication model for distributed real-time systems: design and implementation , 1995, Proceedings Real-Time Technology and Applications Symposium.

[15] Rahul Mangharam,et al. Anytime Algorithms for GPU Architectures , 2011, 2011 IEEE 32nd Real-Time Systems Symposium.

[16] Matei Ripeanu,et al. A GPU accelerated storage system , 2010, HPDC '10.

[17] Jeremy Sugerman,et al. GPU virtualization on VMware's hosted I/O architecture , 2008, OPSR.

[18] Mikhail Bautin,et al. Graphic engine resource management , 2008, Electronic Imaging.

[19] Mark Silberstein,et al. PTask: operating system abstractions to manage GPUs as compute devices , 2011, SOSP.

[20] James H. Anderson,et al. Real-Time Multiprocessor Systems with GPUs ∗ , 2010 .

[21] William Whittaker,et al. Autonomous driving in urban environments: Boss and the Urban Challenge , 2008, J. Field Robotics.

[22] Matei Ripeanu,et al. StoreGPU: exploiting graphics processing units to accelerate distributed storage systems , 2008, HPDC '08.