RGEM: A Responsive GPGPU Execution Model for Runtime Engines

General-purpose computing on graphics processing units, also known as GPGPU, is a burgeoning technique to enhance the computation of parallel programs. Applying this technique to real-time applications, however, requires additional support for timeliness of execution. In particular, the non-preemptive nature of GPGPU, associated with copying data to/from the device memory and launching code onto the device, needs to be managed in a timely manner. In this paper, we present a responsive GPGPU execution model (RGEM), which is a user-space runtime solution to protect the response times of high-priority GPGPU tasks from competing workload. RGEM splits a memory-copy transaction into multiple chunks so that preemption points appear at chunk boundaries. It also ensures that only the highest-priority GPGPU task launches code onto the device at any given time, to avoid performance interference caused by concurrent launches. A prototype implementation of an RGEM-based CUDA runtime engine is provided to evaluate the real-world impact of RGEM. Our experiments demonstrate that the response times of high-priority GPGPU tasks can be protected under RGEM, whereas their response times increase in an unbounded fashion without RGEM support, as the data sizes of competing workload increase.

[1]  Shinpei Kato,et al.  Operating Systems Challenges for GPU Resource Management , 2011 .

[2]  Kevin Skadron,et al.  Enabling Task Parallelism in the CUDA Scheduler , 2009 .

[3]  Pradeep Dubey,et al.  FAST: fast architecture sensitive tree search on modern CPUs and GPUs , 2010, SIGMOD Conference.

[4]  Ramesh Govindan,et al.  Scheduling and IPC mechanisms for continuous media , 1991, SOSP '91.

[5]  Eyal de Lara,et al.  VMM-independent graphics acceleration , 2007, VEE '07.

[6]  Hideyuki Tokuda,et al.  Efficient timing management for user-level real-time threads , 1995, Proceedings Real-Time Technology and Applications Symposium.

[7]  Shinpei Kato,et al.  TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments , 2011, USENIX Annual Technical Conference.

[8]  Shinpei Kato,et al.  Resource Sharing in GPU-Accelerated Windowing Systems , 2011, 2011 17th IEEE Real-Time and Embedded Technology and Applications Symposium.

[9]  Hans Werner Meuer,et al.  Top500 Supercomputer Sites , 1997 .

[10]  Vanish Talwar,et al.  GViM: GPU-accelerated virtual machines , 2009, HPCVirt '09.

[11]  Long Chen,et al.  Dynamic load balancing on single- and multi-GPU systems , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[12]  Alan Burns,et al.  Applying new scheduling theory to static priority pre-emptive scheduling , 1993, Softw. Eng. J..

[13]  Vanish Talwar,et al.  Pegasus: Coordinated Scheduling for Virtualized Accelerator-based Systems , 2011, USENIX Annual Technical Conference.

[14]  Lui Sha,et al.  The real-time publisher/subscriber inter-process communication model for distributed real-time systems: design and implementation , 1995, Proceedings Real-Time Technology and Applications Symposium.

[15]  Rahul Mangharam,et al.  Anytime Algorithms for GPU Architectures , 2011, 2011 IEEE 32nd Real-Time Systems Symposium.

[16]  Matei Ripeanu,et al.  A GPU accelerated storage system , 2010, HPDC '10.

[17]  Jeremy Sugerman,et al.  GPU virtualization on VMware's hosted I/O architecture , 2008, OPSR.

[18]  Mikhail Bautin,et al.  Graphic engine resource management , 2008, Electronic Imaging.

[19]  Mark Silberstein,et al.  PTask: operating system abstractions to manage GPUs as compute devices , 2011, SOSP.

[20]  James H. Anderson,et al.  Real-Time Multiprocessor Systems with GPUs ∗ , 2010 .

[21]  William Whittaker,et al.  Autonomous driving in urban environments: Boss and the Urban Challenge , 2008, J. Field Robotics.

[22]  Matei Ripeanu,et al.  StoreGPU: exploiting graphics processing units to accelerate distributed storage systems , 2008, HPDC '08.