论文信息 - FlexSched: Efficient scheduling techniques for concurrent kernel execution on GPUs - 字舞流文

FlexSched: Efficient scheduling techniques for concurrent kernel execution on GPUs

N. Guil | J. González-Linares | Bernabé López-Albelda | Francisco M. Castro

[1] José María González-Linares,et al. FlexSched: Efficient scheduling techniques for concurrent kernel execution on GPUs , 2021, J. Supercomput..

[2] Hadi Sadoghi Yazdi,et al. cCUDA: Effective Co-Scheduling of Concurrent Kernels on GPUs , 2020, IEEE Transactions on Parallel and Distributed Systems.

[3] Lieven Eeckhout,et al. HSM: A Hybrid Slowdown Model for Multitasking GPUs , 2020, ASPLOS.

[4] Minyi Guo,et al. Themis: Predicting and Reining in Application-Level Slowdown on Spatial Multitasking GPUs , 2019, 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[5] Edson Cataldo,et al. Kernel concurrency opportunities based on GPU benchmarks characterization , 2019, Cluster Computing.

[6] Depei Qian,et al. SMGuard: A Flexible and Fine-Grained Resource Management Framework for GPUs , 2018, IEEE Transactions on Parallel and Distributed Systems.

[7] Nanning Zheng,et al. Accelerate GPU Concurrent Kernel Execution by Mitigating Memory Pipeline Stalls , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[8] Mohamed Ibrahim,et al. Efficient and Fair Multi-programming in GPUs via Effective Bandwidth Management , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[9] Juan Gómez-Luna,et al. A tasks reordering model to reduce transfers overhead on GPUs , 2017, J. Parallel Distributed Comput..

[10] Antonio J. Peña,et al. Chai: Collaborative heterogeneous applications for integrated-architectures , 2017, 2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[11] Changjun Jiang,et al. FLEP: Enabling Flexible and Efficient Preemption on GPUs , 2017, ASPLOS.

[12] Quan Chen,et al. Prophet: Precise QoS Prediction on Non-Preemptive Accelerators to Improve Utilization in Warehouse-Scale Computers , 2017, ASPLOS.

[13] Scott A. Mahlke,et al. Dynamic Resource Management for Efficient Utilization of Multitasking GPUs , 2017, ASPLOS.

[14] Yue Zhao,et al. EffiSha: A Software Framework for Enabling Effficient Preemptive Scheduling of GPU , 2017, PPoPP.

[15] Won Woo Ro,et al. Warped-Slicer: Efficient Intra-SM Slicing through Dynamic Resource Partitioning for GPU Multiprogramming , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[16] Rami G. Melhem,et al. Simultaneous Multikernel GPU: Multi-tasking throughput processors via fine-grained sharing , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[17] Dong Li,et al. Enabling and Exploiting Flexible Task Assignment on GPU through SM-Centric Program Transformations , 2015, ICS.

[18] Scott A. Mahlke,et al. Chimera: Collaborative Preemption for Multitasking on a Shared GPU , 2015, ASPLOS.

[19] Yun Liang,et al. Efficient GPU Spatial-Temporal Multitasking , 2015, IEEE Transactions on Parallel and Distributed Systems.

[20] Mateo Valero,et al. Enabling preemptive multiprogramming on GPUs , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[21] R. Govindarajan,et al. Preemptive thread block scheduling with online structural runtime prediction for concurrent GPGPU kernels , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[22] Mohammad Abdullah Al Faruque,et al. GPU-EvR: Run-time event based real-time scheduling framework on GPGPU platform , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[23] Jianlong Zhong,et al. Kernelet: High-Throughput GPU Kernel Executions with Dynamic Slicing and Scheduling , 2013, IEEE Transactions on Parallel and Distributed Systems.

[24] R. Govindarajan,et al. Improving GPGPU concurrency with elastic kernels , 2013, ASPLOS '13.

[25] T. Steinke,et al. On Improving the Performance of Multi-threaded CUDA Applications with Concurrent Kernel Execution by Kernel Reordering , 2012, 2012 Symposium on Application Accelerators in High Performance Computing.

[26] Nam Sung Kim,et al. The case for GPGPU spatial multitasking , 2012, IEEE International Symposium on High-Performance Comp Architecture.

[27] Shinpei Kato,et al. TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments , 2011, USENIX Annual Technical Conference.

[28] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[29] Henry Wong,et al. Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[30] Ralf Eggeling,et al. User guide , 2000 .