Real-Time Scheduling upon a Host-Centric Acceleration Architecture with Data Offloading

Challenging scheduling problems arise in the implementation of cyber-physical systems upon heterogeneous platforms with (serial) data offloading and (parallel) computation. In this paper, we adapt techniques from scheduling theory to model, analyze, and derive scheduling algorithms for real-time workloads on such platforms. We characterize the performance of the proposed algorithms, both analytically via the approximation ratio metric and experimentally through simulation experiments upon synthetic workloads that are justified via a case study on a CPU-GPU platform. The evaluation exposes some divergence between the analytical characterization and experimental one; recommendations that seek to balance such divergent characterizations are made regarding the choice of algorithmic approaches.

[1]  James H. Anderson,et al.  Globally scheduled real-time multiprocessor systems with GPUs , 2011, Real-Time Systems.

[2]  Shige Wang,et al.  Fractional GPUs: Software-Based Compute and Memory Bandwidth Reservation for GPUs , 2019, 2019 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS).

[3]  Dong Li,et al.  Enabling and Exploiting Flexible Task Assignment on GPU through SM-Centric Program Transformations , 2015, ICS.

[4]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Ulrich Margull,et al.  GPUart - An application-based limited preemptive GPU real-time scheduler for embedded systems , 2019, J. Syst. Archit..

[6]  Mark Silberstein,et al.  PTask: operating system abstractions to manage GPUs as compute devices , 2011, SOSP.

[7]  Kyoung-Don Kang,et al.  Supporting Preemptive Task Executions and Memory Copies in GPGPUs , 2012, 2012 24th Euromicro Conference on Real-Time Systems.

[8]  Ming Yang,et al.  GPU Scheduling on the NVIDIA TX2: Hidden Details Revealed , 2017, 2017 IEEE Real-Time Systems Symposium (RTSS).

[9]  Jatinder N. D. Gupta,et al.  Two-Stage, Hybrid Flowshop Scheduling Problem , 1988 .

[10]  Jason Maassen,et al.  Performance Models for CPU-GPU Data Transfers , 2014, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[11]  Ming Yang,et al.  Avoiding Pitfalls when Using NVIDIA GPUs for Real-Time Tasks in Autonomous Systems , 2018, ECRTS.

[12]  Hyoseung Kim,et al.  Thermal-Aware Servers for Real-Time Tasks on Multi-Core GPU-Integrated Embedded Systems , 2019, 2019 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS).

[13]  Jeff A. Stuart,et al.  A study of Persistent Threads style GPU programming for GPGPU workloads , 2012, 2012 Innovative Parallel Computing (InPar).

[14]  Shinpei Kato,et al.  TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments , 2011, USENIX Annual Technical Conference.

[15]  Robert E. Tarjan,et al.  Performance Bounds for Level-Oriented Two-Dimensional Packing Algorithms , 1980, SIAM J. Comput..

[16]  Henry Hoffmann,et al.  MERLOT: Architectural Support for Energy-Efficient Real-Time Processing in GPUs , 2018, 2018 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS).

[17]  Robert D. Howe,et al.  GPU Based Real-Time Instrument Tracking with Three Dimensional Ultrasound , 2006, MICCAI.

[18]  S. M. Johnson,et al.  Optimal two- and three-stage production schedules with setup times included , 1954 .

[19]  Wenzhi Cui,et al.  MAVBench: Micro Aerial Vehicle Benchmarking , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[20]  F. D. Smith,et al.  GPU Sharing for Image Processing in Embedded Real-Time Systems∗ , 2016 .

[21]  Ming Yang,et al.  Inferring the Scheduling Policies of an Embedded CUDA GPU , 2017 .

[22]  Depei Qian,et al.  SMGuard: A Flexible and Fine-Grained Resource Management Framework for GPUs , 2018, IEEE Transactions on Parallel and Distributed Systems.

[23]  Shinpei Kato,et al.  Gdev: First-Class GPU Resource Management in the Operating System , 2012, USENIX Annual Technical Conference.

[24]  Shinpei Kato,et al.  RGEM: A Responsive GPGPU Execution Model for Runtime Engines , 2011, 2011 IEEE 32nd Real-Time Systems Symposium.

[25]  Didier Stricker,et al.  Towards scheduling hard real-time image processing tasks on a single GPU , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[26]  Jörg Henkel,et al.  Co-Scheduling on Fused CPU-GPU Architectures With Shared Last Level Caches , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[27]  Pangfeng Liu,et al.  A collaborative CPU-GPU approach for principal component analysis on mobile heterogeneous platforms , 2018, J. Parallel Distributed Comput..

[28]  Nicola Capodieci,et al.  Deadline-Based Scheduling for GPU with Preemption Support , 2018, 2018 IEEE Real-Time Systems Symposium (RTSS).

[29]  Cong Liu,et al.  GPES: a preemptive execution system for GPGPU computing , 2015, 21st IEEE Real-Time and Embedded Technology and Applications Symposium.

[30]  Takeo Kanade,et al.  GPU-accelerated real-time 3D tracking for humanoid locomotion and stair climbing , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[31]  Milind Kulkarni,et al.  Hybrid CPU-GPU scheduling and execution of tree traversals , 2016, ICS.

[32]  Kim M. Hazelwood,et al.  Where is the data? Why you cannot debate CPU vs. GPU performance without the answer , 2011, (IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE.

[33]  Bin Wang,et al.  A user mode CPU-GPU scheduling framework for hybrid workloads , 2016, Future Gener. Comput. Syst..

[34]  Waqar Ali,et al.  Protecting Real-Time GPU Applications on Integrated CPU-GPU SoC Platforms , 2017, ArXiv.

[35]  Klaus Jansen,et al.  A(3/2+ε) approximation algorithm for scheduling moldable and non-moldable parallel tasks , 2012, SPAA '12.

[36]  Denis Trystram,et al.  A 3/2-Approximation Algorithm for Scheduling Independent Monotonic Malleable Tasks , 2007, SIAM J. Comput..

[37]  Ronald L. Rivest,et al.  Orthogonal Packings in Two Dimensions , 1980, SIAM J. Comput..

[38]  James H. Anderson,et al.  GPUSync: A Framework for Real-Time GPU Management , 2013, 2013 IEEE 34th Real-Time Systems Symposium.

[39]  Cong Liu,et al.  S^3DNN: Supervised Streaming and Scheduling for GPU-Accelerated Real-Time DNN Workloads , 2018, 2018 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS).

[40]  Ming Yang,et al.  An Evaluation of the NVIDIA TX1 for Supporting Real-Time Computer-Vision Workloads , 2017, 2017 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS).

[41]  Paolo Valente,et al.  SiGAMMA: server based integrated GPU arbitration mechanism for memory accesses , 2017, RTNS.

[42]  Ming Yang,et al.  Making OpenVX Really "Real Time" , 2018, 2018 IEEE Real-Time Systems Symposium (RTSS).

[43]  Euiseong Seo,et al.  A GPU Kernel Transactionization Scheme for Preemptive Priority Scheduling , 2018, 2018 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS).

[44]  Shuvra S. Bhattacharyya,et al.  Memory-Constrained Vectorization and Scheduling of Dataflow Graphs for Hybrid CPU-GPU Platforms , 2018, ACM Trans. Embed. Comput. Syst..

[45]  Nicola Capodieci,et al.  Memory interference characterization between CPU cores and integrated GPUs in mixed-criticality platforms , 2017, 2017 22nd IEEE International Conference on Emerging Technologies and Factory Automation (ETFA).

[46]  Kenli Li,et al.  GFlink: An In-Memory Computing Architecture on Heterogeneous CPU-GPU Clusters for Big Data , 2016, IEEE Transactions on Parallel and Distributed Systems.