FlexSched: Efficient scheduling techniques for concurrent kernel execution on GPUs

[1]  José María González-Linares,et al.  FlexSched: Efficient scheduling techniques for concurrent kernel execution on GPUs , 2021, J. Supercomput..

[2]  Hadi Sadoghi Yazdi,et al.  cCUDA: Effective Co-Scheduling of Concurrent Kernels on GPUs , 2020, IEEE Transactions on Parallel and Distributed Systems.

[3]  Lieven Eeckhout,et al.  HSM: A Hybrid Slowdown Model for Multitasking GPUs , 2020, ASPLOS.

[4]  Minyi Guo,et al.  Themis: Predicting and Reining in Application-Level Slowdown on Spatial Multitasking GPUs , 2019, 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[5]  Edson Cataldo,et al.  Kernel concurrency opportunities based on GPU benchmarks characterization , 2019, Cluster Computing.

[6]  Depei Qian,et al.  SMGuard: A Flexible and Fine-Grained Resource Management Framework for GPUs , 2018, IEEE Transactions on Parallel and Distributed Systems.

[7]  Nanning Zheng,et al.  Accelerate GPU Concurrent Kernel Execution by Mitigating Memory Pipeline Stalls , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[8]  Mohamed Ibrahim,et al.  Efficient and Fair Multi-programming in GPUs via Effective Bandwidth Management , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[9]  Juan Gómez-Luna,et al.  A tasks reordering model to reduce transfers overhead on GPUs , 2017, J. Parallel Distributed Comput..

[10]  Antonio J. Peña,et al.  Chai: Collaborative heterogeneous applications for integrated-architectures , 2017, 2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[11]  Changjun Jiang,et al.  FLEP: Enabling Flexible and Efficient Preemption on GPUs , 2017, ASPLOS.

[12]  Quan Chen,et al.  Prophet: Precise QoS Prediction on Non-Preemptive Accelerators to Improve Utilization in Warehouse-Scale Computers , 2017, ASPLOS.

[13]  Scott A. Mahlke,et al.  Dynamic Resource Management for Efficient Utilization of Multitasking GPUs , 2017, ASPLOS.

[14]  Yue Zhao,et al.  EffiSha: A Software Framework for Enabling Effficient Preemptive Scheduling of GPU , 2017, PPoPP.

[15]  Won Woo Ro,et al.  Warped-Slicer: Efficient Intra-SM Slicing through Dynamic Resource Partitioning for GPU Multiprogramming , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[16]  Rami G. Melhem,et al.  Simultaneous Multikernel GPU: Multi-tasking throughput processors via fine-grained sharing , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[17]  Dong Li,et al.  Enabling and Exploiting Flexible Task Assignment on GPU through SM-Centric Program Transformations , 2015, ICS.

[18]  Scott A. Mahlke,et al.  Chimera: Collaborative Preemption for Multitasking on a Shared GPU , 2015, ASPLOS.

[19]  Yun Liang,et al.  Efficient GPU Spatial-Temporal Multitasking , 2015, IEEE Transactions on Parallel and Distributed Systems.

[20]  Mateo Valero,et al.  Enabling preemptive multiprogramming on GPUs , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[21]  R. Govindarajan,et al.  Preemptive thread block scheduling with online structural runtime prediction for concurrent GPGPU kernels , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[22]  Mohammad Abdullah Al Faruque,et al.  GPU-EvR: Run-time event based real-time scheduling framework on GPGPU platform , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[23]  Jianlong Zhong,et al.  Kernelet: High-Throughput GPU Kernel Executions with Dynamic Slicing and Scheduling , 2013, IEEE Transactions on Parallel and Distributed Systems.

[24]  R. Govindarajan,et al.  Improving GPGPU concurrency with elastic kernels , 2013, ASPLOS '13.

[25]  T. Steinke,et al.  On Improving the Performance of Multi-threaded CUDA Applications with Concurrent Kernel Execution by Kernel Reordering , 2012, 2012 Symposium on Application Accelerators in High Performance Computing.

[26]  Nam Sung Kim,et al.  The case for GPGPU spatial multitasking , 2012, IEEE International Symposium on High-Performance Comp Architecture.

[27]  Shinpei Kato,et al.  TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments , 2011, USENIX Annual Technical Conference.

[28]  Kevin Skadron,et al.  Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[29]  Henry Wong,et al.  Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[30]  Ralf Eggeling,et al.  User guide , 2000 .