FlexSched: Efficient scheduling techniques for concurrent kernel execution on GPUs
暂无分享,去创建一个
[1] José María González-Linares,et al. FlexSched: Efficient scheduling techniques for concurrent kernel execution on GPUs , 2021, J. Supercomput..
[2] Hadi Sadoghi Yazdi,et al. cCUDA: Effective Co-Scheduling of Concurrent Kernels on GPUs , 2020, IEEE Transactions on Parallel and Distributed Systems.
[3] Lieven Eeckhout,et al. HSM: A Hybrid Slowdown Model for Multitasking GPUs , 2020, ASPLOS.
[4] Minyi Guo,et al. Themis: Predicting and Reining in Application-Level Slowdown on Spatial Multitasking GPUs , 2019, 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[5] Edson Cataldo,et al. Kernel concurrency opportunities based on GPU benchmarks characterization , 2019, Cluster Computing.
[6] Depei Qian,et al. SMGuard: A Flexible and Fine-Grained Resource Management Framework for GPUs , 2018, IEEE Transactions on Parallel and Distributed Systems.
[7] Nanning Zheng,et al. Accelerate GPU Concurrent Kernel Execution by Mitigating Memory Pipeline Stalls , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[8] Mohamed Ibrahim,et al. Efficient and Fair Multi-programming in GPUs via Effective Bandwidth Management , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[9] Juan Gómez-Luna,et al. A tasks reordering model to reduce transfers overhead on GPUs , 2017, J. Parallel Distributed Comput..
[10] Antonio J. Peña,et al. Chai: Collaborative heterogeneous applications for integrated-architectures , 2017, 2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[11] Changjun Jiang,et al. FLEP: Enabling Flexible and Efficient Preemption on GPUs , 2017, ASPLOS.
[12] Quan Chen,et al. Prophet: Precise QoS Prediction on Non-Preemptive Accelerators to Improve Utilization in Warehouse-Scale Computers , 2017, ASPLOS.
[13] Scott A. Mahlke,et al. Dynamic Resource Management for Efficient Utilization of Multitasking GPUs , 2017, ASPLOS.
[14] Yue Zhao,et al. EffiSha: A Software Framework for Enabling Effficient Preemptive Scheduling of GPU , 2017, PPoPP.
[15] Won Woo Ro,et al. Warped-Slicer: Efficient Intra-SM Slicing through Dynamic Resource Partitioning for GPU Multiprogramming , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[16] Rami G. Melhem,et al. Simultaneous Multikernel GPU: Multi-tasking throughput processors via fine-grained sharing , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[17] Dong Li,et al. Enabling and Exploiting Flexible Task Assignment on GPU through SM-Centric Program Transformations , 2015, ICS.
[18] Scott A. Mahlke,et al. Chimera: Collaborative Preemption for Multitasking on a Shared GPU , 2015, ASPLOS.
[19] Yun Liang,et al. Efficient GPU Spatial-Temporal Multitasking , 2015, IEEE Transactions on Parallel and Distributed Systems.
[20] Mateo Valero,et al. Enabling preemptive multiprogramming on GPUs , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).
[21] R. Govindarajan,et al. Preemptive thread block scheduling with online structural runtime prediction for concurrent GPGPU kernels , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).
[22] Mohammad Abdullah Al Faruque,et al. GPU-EvR: Run-time event based real-time scheduling framework on GPGPU platform , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[23] Jianlong Zhong,et al. Kernelet: High-Throughput GPU Kernel Executions with Dynamic Slicing and Scheduling , 2013, IEEE Transactions on Parallel and Distributed Systems.
[24] R. Govindarajan,et al. Improving GPGPU concurrency with elastic kernels , 2013, ASPLOS '13.
[25] T. Steinke,et al. On Improving the Performance of Multi-threaded CUDA Applications with Concurrent Kernel Execution by Kernel Reordering , 2012, 2012 Symposium on Application Accelerators in High Performance Computing.
[26] Nam Sung Kim,et al. The case for GPGPU spatial multitasking , 2012, IEEE International Symposium on High-Performance Comp Architecture.
[27] Shinpei Kato,et al. TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments , 2011, USENIX Annual Technical Conference.
[28] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[29] Henry Wong,et al. Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[30] Ralf Eggeling,et al. User guide , 2000 .