GPU Scheduling on the NVIDIA TX2: Hidden Details Revealed

The push towards fielding autonomous-driving capabilities in vehicles is happening at breakneck speed. Semi-autonomous features are becoming increasingly common, and fully autonomous vehicles are optimistically forecast to be widely available in just a few years. Today, graphics processing units (GPUs) are seen as a key technology in this push towards greater autonomy. However, realizing full autonomy in mass-production vehicles will necessitate the use of stringent certification processes. Currently available GPUs pose challenges in this regard, as they tend to be closed-source “black boxes” that have features that are not publicly disclosed. For certification to be tenable, such features must be documented. This paper reports on such a documentation effort. This effort was directed at the NVIDIA TX2, which is one of the most prominent GPU-enabled platforms marketed today for autonomous systems. In this paper, important aspects of the TX2’s GPU scheduler are revealed as discerned through experimental testing and validation.

[1]  Mohammad Abdullah Al Faruque,et al.  Run-Time Scheduling Framework for Event-Driven Applications on a GPU-Based Embedded System , 2016, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[2]  Eduardo Tovar,et al.  WCET Measurement-based and Extreme Value Theory Characterisation of CUDA Kernels , 2014, RTNS.

[3]  Shinpei Kato,et al.  TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments , 2011, USENIX Annual Technical Conference.

[4]  Eduardo Tovar,et al.  Measurement-Based Probabilistic Timing Analysis for Graphics Processor Units , 2016, ARCS.

[5]  Shinpei Kato,et al.  RGEM: A Responsive GPGPU Execution Model for Runtime Engines , 2011, 2011 IEEE 32nd Real-Time Systems Symposium.

[6]  Jianlong Zhong,et al.  Kernelet: High-Throughput GPU Kernel Executions with Dynamic Slicing and Scheduling , 2013, IEEE Transactions on Parallel and Distributed Systems.

[7]  Ming Yang,et al.  An Evaluation of the NVIDIA TX 1 for Supporting Real-time ComputerVision Workloads , 2017 .

[8]  Andreas Moshovos,et al.  Demystifying GPU microarchitecture through microbenchmarking , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).

[9]  F. D. Smith,et al.  GPU Sharing for Image Processing in Embedded Real-Time Systems∗ , 2016 .

[10]  Petru Eles,et al.  Systematic detection of memory related performance bottlenecks in GPGPU programs , 2016, J. Syst. Archit..

[11]  Konstantinos Bletsas,et al.  Faster makespan estimation for GPU threads on a single streaming multiprocessor , 2013, 2013 IEEE 18th Conference on Emerging Technologies & Factory Automation (ETFA).

[12]  Kyoung-Don Kang,et al.  Supporting Preemptive Task Executions and Memory Copies in GPGPUs , 2012, 2012 24th Euromicro Conference on Real-Time Systems.

[13]  James H. Anderson,et al.  GPUSync: A Framework for Real-Time GPU Management , 2013, 2013 IEEE 34th Real-Time Systems Symposium.

[14]  Shinpei Kato,et al.  Supporting Low-Latency CPS Using GPUs and Direct I/O Schemes , 2012, 2012 IEEE International Conference on Embedded and Real-Time Computing Systems and Applications.

[15]  Adam Betts,et al.  Estimating the WCET of GPU-Accelerated Applications Using Hybrid Analysis , 2013, 2013 25th Euromicro Conference on Real-Time Systems.

[16]  Cong Liu,et al.  GPES: a preemptive execution system for GPGPU computing , 2015, 21st IEEE Real-Time and Embedded Technology and Applications Symposium.

[17]  Xinxin Mei,et al.  Dissecting GPU Memory Hierarchy Through Microbenchmarking , 2015, IEEE Transactions on Parallel and Distributed Systems.

[18]  Henk Corporaal,et al.  Adaptive and transparent cache bypassing for GPUs , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[19]  Tarek A. El-Ghazawi,et al.  Exploiting concurrent kernel execution on graphic processing units , 2011, 2011 International Conference on High Performance Computing & Simulation.

[20]  Björn Andersson,et al.  Makespan Computation for GPU Threads Running on a Single Streaming Multiprocessor , 2012, 2012 24th Euromicro Conference on Real-Time Systems.

[21]  Ming Yang,et al.  Inferring the Scheduling Policies of an Embedded CUDA GPU , 2017 .

[22]  Avi Mendelson,et al.  Scheduling processing of real-time data streams on heterogeneous multi-GPU systems , 2012, SYSTOR '12.

[23]  Ming Yang,et al.  An Evaluation of the NVIDIA TX1 for Supporting Real-Time Computer-Vision Workloads , 2017, 2017 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS).

[24]  Hennadiy Leontyev,et al.  Tardiness Bounds for FIFO Scheduling on Multiprocessors , 2007, 19th Euromicro Conference on Real-Time Systems (ECRTS'07).

[25]  Avi Mendelson,et al.  Batch Method for Efficient Resource Sharing in Real-Time Multi-GPU Systems , 2014, ICDCN.

[26]  Depei Qian,et al.  Scheduling Tasks with Mixed Timing Constraints in GPU-Powered Real-Time Systems , 2016, ICS.