Maximizing the GPU resource usage by reordering concurrent kernels submission
暂无分享,去创建一个
Esteban Walter Gonzalez Clua | Lúcia Maria de A. Drummond | Cristiana Bentes | Eduardo C. Vasconcellos | Bernardo B. Labronici | Rommel Anatoli Quintanilla Cruz | Pablo Carvalho | E. C. Vasconcellos | E. Clua | C. Bentes | Lúcia M. A. Drummond | Pablo Carvalho | R. Cruz
[1] Kevin Skadron,et al. Enabling Task Parallelism in the CUDA Scheduler , 2009 .
[2] Srimat T. Chakradhar,et al. Supporting GPU sharing in cloud environments with a transparent runtime consolidation framework , 2011, HPDC '11.
[3] Jianlong Zhong,et al. Kernelet: High-Throughput GPU Kernel Executions with Dynamic Slicing and Scheduling , 2013, IEEE Transactions on Parallel and Distributed Systems.
[4] Mateo Valero,et al. Enabling preemptive multiprogramming on GPUs , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).
[5] Yun Liang,et al. Efficient GPU Spatial-Temporal Multitasking , 2015, IEEE Transactions on Parallel and Distributed Systems.
[6] Tulika Mitra,et al. Improving GPGPU energy-efficiency through concurrent kernel execution and DVFS , 2015, 2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[7] Joseph Zambreno,et al. Increasing GPU throughput using kernel interleaved thread block scheduling , 2013, 2013 IEEE 31st International Conference on Computer Design (ICCD).
[8] Norbert Luttenberger,et al. Efficiently Using a CUDA-enabled GPU as Shared Resource , 2010, 2010 10th IEEE International Conference on Computer and Information Technology.
[9] Nam Sung Kim,et al. The case for GPGPU spatial multitasking , 2012, IEEE International Symposium on High-Performance Comp Architecture.
[10] R. Govindarajan,et al. Improving GPGPU concurrency with elastic kernels , 2013, ASPLOS '13.
[11] Collin McCurdy,et al. The Scalable Heterogeneous Computing (SHOC) benchmark suite , 2010, GPGPU-3.
[12] Scott A. Mahlke,et al. Chimera: Collaborative Preemption for Multitasking on a Shared GPU , 2015, ASPLOS.
[13] Paolo Toth,et al. Knapsack Problems: Algorithms and Computer Implementations , 1990 .
[14] Tarek A. El-Ghazawi,et al. Exploiting concurrent kernel execution on graphic processing units , 2011, 2011 International Conference on High Performance Computing & Simulation.
[15] Stijn Eyerman,et al. System-Level Performance Metrics for Multiprogram Workloads , 2008, IEEE Micro.
[16] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[17] Jong-Myon Kim,et al. An efficient scheduling scheme using estimated execution time for heterogeneous computing systems , 2013, The Journal of Supercomputing.
[18] Wei Yi,et al. Kernel Fusion: An Effective Method for Better Power Efficiency on Multithreaded GPU , 2010, 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing.
[19] Wen-mei W. Hwu,et al. Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing , 2012 .
[20] Vikram K. Narayana,et al. A Power-Aware Symbiotic Scheduling Algorithm for Concurrent GPU Kernels , 2015, 2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS).
[21] T. Steinke,et al. On Improving the Performance of Multi-threaded CUDA Applications with Concurrent Kernel Execution by Kernel Reordering , 2012, 2012 Symposium on Application Accelerators in High Performance Computing.
[22] Alexander Mendiburu,et al. A Survey of Performance Modeling and Simulation Techniques for Accelerator-Based Computing , 2015, IEEE Transactions on Parallel and Distributed Systems.
[23] Kevin Skadron,et al. Fine-grained resource sharing for concurrent GPGPU kernels , 2012, HotPar'12.