论文信息 - GPU Scheduling on the NVIDIA TX2: Hidden Details Revealed

GPU Scheduling on the NVIDIA TX2: Hidden Details Revealed

The push towards fielding autonomous-driving capabilities in vehicles is happening at breakneck speed. Semi-autonomous features are becoming increasingly common, and fully autonomous vehicles are optimistically forecast to be widely available in just a few years. Today, graphics processing units (GPUs) are seen as a key technology in this push towards greater autonomy. However, realizing full autonomy in mass-production vehicles will necessitate the use of stringent certification processes. Currently available GPUs pose challenges in this regard, as they tend to be closed-source “black boxes” that have features that are not publicly disclosed. For certification to be tenable, such features must be documented. This paper reports on such a documentation effort. This effort was directed at the NVIDIA TX2, which is one of the most prominent GPU-enabled platforms marketed today for autonomous systems. In this paper, important aspects of the TX2’s GPU scheduler are revealed as discerned through experimental testing and validation.

[1] Mohammad Abdullah Al Faruque,et al. Run-Time Scheduling Framework for Event-Driven Applications on a GPU-Based Embedded System , 2016, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[2] Eduardo Tovar,et al. WCET Measurement-based and Extreme Value Theory Characterisation of CUDA Kernels , 2014, RTNS.

[3] Shinpei Kato,et al. TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments , 2011, USENIX Annual Technical Conference.

[4] Eduardo Tovar,et al. Measurement-Based Probabilistic Timing Analysis for Graphics Processor Units , 2016, ARCS.

[5] Shinpei Kato,et al. RGEM: A Responsive GPGPU Execution Model for Runtime Engines , 2011, 2011 IEEE 32nd Real-Time Systems Symposium.

[6] Jianlong Zhong,et al. Kernelet: High-Throughput GPU Kernel Executions with Dynamic Slicing and Scheduling , 2013, IEEE Transactions on Parallel and Distributed Systems.

[7] Ming Yang,et al. An Evaluation of the NVIDIA TX 1 for Supporting Real-time ComputerVision Workloads , 2017 .

[8] Andreas Moshovos,et al. Demystifying GPU microarchitecture through microbenchmarking , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).

[9] F. D. Smith,et al. GPU Sharing for Image Processing in Embedded Real-Time Systems∗ , 2016 .

[10] Petru Eles,et al. Systematic detection of memory related performance bottlenecks in GPGPU programs , 2016, J. Syst. Archit..

[11] Konstantinos Bletsas,et al. Faster makespan estimation for GPU threads on a single streaming multiprocessor , 2013, 2013 IEEE 18th Conference on Emerging Technologies & Factory Automation (ETFA).

[12] Kyoung-Don Kang,et al. Supporting Preemptive Task Executions and Memory Copies in GPGPUs , 2012, 2012 24th Euromicro Conference on Real-Time Systems.

[13] James H. Anderson,et al. GPUSync: A Framework for Real-Time GPU Management , 2013, 2013 IEEE 34th Real-Time Systems Symposium.

[14] Shinpei Kato,et al. Supporting Low-Latency CPS Using GPUs and Direct I/O Schemes , 2012, 2012 IEEE International Conference on Embedded and Real-Time Computing Systems and Applications.

[15] Adam Betts,et al. Estimating the WCET of GPU-Accelerated Applications Using Hybrid Analysis , 2013, 2013 25th Euromicro Conference on Real-Time Systems.

[16] Cong Liu,et al. GPES: a preemptive execution system for GPGPU computing , 2015, 21st IEEE Real-Time and Embedded Technology and Applications Symposium.

[17] Xinxin Mei,et al. Dissecting GPU Memory Hierarchy Through Microbenchmarking , 2015, IEEE Transactions on Parallel and Distributed Systems.

[18] Henk Corporaal,et al. Adaptive and transparent cache bypassing for GPUs , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[19] Tarek A. El-Ghazawi,et al. Exploiting concurrent kernel execution on graphic processing units , 2011, 2011 International Conference on High Performance Computing & Simulation.

[20] Björn Andersson,et al. Makespan Computation for GPU Threads Running on a Single Streaming Multiprocessor , 2012, 2012 24th Euromicro Conference on Real-Time Systems.

[21] Ming Yang,et al. Inferring the Scheduling Policies of an Embedded CUDA GPU , 2017 .

[22] Avi Mendelson,et al. Scheduling processing of real-time data streams on heterogeneous multi-GPU systems , 2012, SYSTOR '12.

[23] Ming Yang,et al. An Evaluation of the NVIDIA TX1 for Supporting Real-Time Computer-Vision Workloads , 2017, 2017 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS).

[24] Hennadiy Leontyev,et al. Tardiness Bounds for FIFO Scheduling on Multiprocessors , 2007, 19th Euromicro Conference on Real-Time Systems (ECRTS'07).

[25] Avi Mendelson,et al. Batch Method for Efficient Resource Sharing in Real-Time Multi-GPU Systems , 2014, ICDCN.

[26] Depei Qian,et al. Scheduling Tasks with Mixed Timing Constraints in GPU-Powered Real-Time Systems , 2016, ICS.