Idempotence-Based Preemptive GPU Kernel Scheduling for Embedded Systems

Mission-critical embedded systems simultaneously run multiple GPU-computing tasks with different criticality and timeliness requirements. Considerable research effort has been dedicated to support the preemptive priority scheduling of GPU kernels. However, hardware-supported preemption leads to lengthy scheduling delays and complicated designs, and most software approaches depend on the voluntary yielding of GPU resources from restructured kernels. We propose a preemptive GPU-kernel scheduling scheme that harness the idempotence property of kernels. The proposed scheme distinguishes idempotent kernels through a static source analysis. If a kernel is not idempotent, then GPU kernels are transactionized at the operating system level. Both idempotent and transactionized kernels can be aborted at any point during their execution and rolled back to their initial state for reexecution. Therefore, the low-priority kernel instances can be preempted for the high-priority kernel instances and reexecuted after the GPU becomes available again. Our evaluation using the Rodinia benchmark suite showed that the proposed approach limits the preemption delay to 18 μs in the 99.9th percentile, with an average delay in execution time of less than 10 % for high-priority tasks under a heavy load in most cases.

[1]  Qingrui Liu,et al.  Compiler-Directed Failure Atomicity for Nonvolatile Memory , 2019 .

[2]  Yuebin Bai,et al.  An Efficient Checkpoint and Recovery Mechanism for Real-Time Embedded Systems , 2018, 2018 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom).

[3]  Karthikeyan Sankaralingam,et al.  Idempotent processor architecture , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[4]  Zhen Lin,et al.  Exploring Memory Persistency Models for GPUs , 2019, 2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[5]  Karthikeyan Sankaralingam,et al.  Idempotent code generation: Implementation, analysis, and evaluation , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[6]  Taesoon Park,et al.  Checkpointing and rollback-recovery in distributed systems , 1989 .

[7]  苏帅 单卡之王 NVIDIA GeForce GTX 1080 , 2016 .

[8]  Mark Silberstein,et al.  PTask: operating system abstractions to manage GPUs as compute devices , 2011, SOSP.

[9]  Michael L. Scott,et al.  iDO: Compiler-Directed Failure Atomicity for Nonvolatile Memory , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[10]  Shinpei Kato,et al.  Operating Systems Challenges for GPU Resource Management , 2011 .

[11]  Brandon Lucia,et al.  Adaptive Dynamic Checkpointing for Safe Efficient Intermittent Computing , 2018, OSDI.

[12]  Yue Zhao,et al.  EffiSha: A Software Framework for Enabling Effficient Preemptive Scheduling of GPU , 2017, PPoPP.

[13]  Devesh Tiwari,et al.  Compiler-Directed Lightweight Checkpointing for Fine-Grained Guaranteed Soft Error Recovery , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[14]  Scott A. Mahlke,et al.  Chimera: Collaborative Preemption for Multitasking on a Shared GPU , 2015, ASPLOS.

[15]  Makoto Motoyoshi,et al.  Through-Silicon Via (TSV) , 2009, Proceedings of the IEEE.

[16]  J. Duell The design and implementation of Berkeley Lab's linux checkpoint/restart , 2005 .

[17]  Changjun Jiang,et al.  FLEP: Enabling Flexible and Efficient Preemption on GPUs , 2017, ASPLOS.

[18]  John D. Owens,et al.  Multitasking Real-time Embedded GPU Computing Tasks , 2016, PMAM@PPoPP.

[19]  Shinpei Kato,et al.  Towards adaptive GPU resource management for embedded real-time systems , 2013, SIGBED.

[20]  Stephen W. Keckler,et al.  Page Placement Strategies for GPUs within Heterogeneous Memory Systems , 2015, ASPLOS.

[21]  Reem Elkhouly,et al.  Efficient Checkpointing with Recompute Scheme for Non-volatile Main Memory , 2019, ACM Trans. Archit. Code Optim..

[22]  Mateo Valero,et al.  Enabling preemptive multiprogramming on GPUs , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[23]  Terence Parr,et al.  The Definitive ANTLR 4 Reference , 2013 .

[24]  Kyoung-Don Kang,et al.  Supporting Preemptive Task Executions and Memory Copies in GPGPUs , 2012, 2012 24th Euromicro Conference on Real-Time Systems.

[25]  Petru Eles,et al.  General purpose computing on low-power embedded GPUs: Has it come of age? , 2013, 2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS).

[26]  Euiseong Seo,et al.  A GPU Kernel Transactionization Scheme for Preemptive Priority Scheduling , 2018, 2018 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS).

[27]  Abraham Silberschatz,et al.  Operating System Concepts , 1983 .

[28]  D. Scott Wills,et al.  Accelerating Adaptive Background Modeling on Low-Power Integrated GPUs , 2012, 2012 41st International Conference on Parallel Processing Workshops.

[29]  Somesh Jha,et al.  Static analysis and compiler design for idempotent processing , 2012, PLDI.

[30]  Zhen Lin,et al.  Enabling Efficient Preemption for SIMT Architectures with Lightweight Context Switching , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[31]  Karthikeyan Sankaralingam,et al.  iGPU: Exception support and speculative execution on GPUs , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[32]  Yusuke Suzuki Towards Multi-tenant GPGPU : Event-driven Programming Model for System-wide Scheduling on Shared GPUs , 2016 .

[33]  Kevin Skadron,et al.  Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).