Protecting Real-Time GPU Applications on Integrated CPU-GPU SoC Platforms

Integrated CPU-GPU architecture provides excellent acceleration capabilities for data parallel applications on embedded platforms while meeting the size, weight and power (SWaP) requirements. However, sharing of main memory between CPU applications and GPU kernels can severely affect the execution of GPU kernels and diminish the performance gain provided by GPU. In the NVIDIA Tegra TK1 platform which has the integrated CPU-GPU architecture, we noticed that in the worst case scenario, the GPU kernels can suffer as much as 4X slowdown in the presence of co-running memory intensive CPU applications compared to their solo execution. In this paper, we propose a kernel mechanism called BWLOCK++ which can be used to protect the performance of GPU kernels from corunning memory intensive CPU applications. Our preliminary investigations show that by using BWLOCK++, the performance slowdown of GPU kernels in the presence of memory contention can be decreased by up-to 63%.

[1]  Shige Wang,et al.  A server-based approach for predictable GPU access control , 2017, 2017 IEEE 23rd International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA).

[2]  Heechul Yun,et al.  Taming Non-Blocking Caches to Improve Isolation in Multicore Real-Time Systems , 2016, 2016 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS).

[3]  Shinpei Kato,et al.  An Open Approach to Autonomous Vehicles , 2015, IEEE Micro.

[4]  W KecklerStephen,et al.  Page Placement Strategies for GPUs within Heterogeneous Memory Systems , 2015 .

[5]  Waqar Ali,et al.  BWLOCK: A Dynamic Memory Access Control Framework for Soft Real-Time Applications on Multicore Platforms , 2017, IEEE Transactions on Computers.

[6]  Ming Yang,et al.  GPU Scheduling on the NVIDIA TX2: Hidden Details Revealed , 2017, 2017 IEEE Real-Time Systems Symposium (RTSS).

[7]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[8]  Wen-mei W. Hwu,et al.  Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing , 2012 .

[9]  Shinpei Kato,et al.  Gdev: First-Class GPU Resource Management in the Operating System , 2012, USENIX Annual Technical Conference.

[10]  Marco Caccamo,et al.  A Predictable Execution Model for COTS-Based Embedded Systems , 2011, 2011 17th IEEE Real-Time and Embedded Technology and Applications Symposium.

[11]  Lui Sha,et al.  Priority Inheritance Protocols: An Approach to Real-Time Synchronization , 1990, IEEE Trans. Computers.

[12]  Cong Liu,et al.  GPES: a preemptive execution system for GPGPU computing , 2015, 21st IEEE Real-Time and Embedded Technology and Applications Symposium.

[13]  Luca Benini,et al.  GPUguard: Towards supporting a predictable execution model for heterogeneous SoC , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[14]  Lui Sha,et al.  MemGuard: Memory bandwidth reservation system for efficient performance isolation in multi-core platforms , 2013, 2013 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS).

[15]  Heechul Yun,et al.  Addressing isolation challenges of non-blocking caches for multicore real-time systems , 2017, Real-Time Systems.

[16]  James H. Anderson,et al.  GPUSync: A Framework for Real-Time GPU Management , 2013, 2013 IEEE 34th Real-Time Systems Symposium.

[17]  Alan Burns,et al.  Applying new scheduling theory to static priority pre-emptive scheduling , 1993, Softw. Eng. J..

[18]  Ming Yang,et al.  An Evaluation of the NVIDIA TX1 for Supporting Real-Time Computer-Vision Workloads , 2017, 2017 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS).

[19]  Ming Yang,et al.  Inferring the Scheduling Policies of an Embedded CUDA GPU , 2017 .

[20]  Stephen W. Keckler,et al.  Page Placement Strategies for GPUs within Heterogeneous Memory Systems , 2015, ASPLOS.

[21]  Shinpei Kato,et al.  TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments , 2011, USENIX Annual Technical Conference.