Preemptive thread block scheduling with online structural runtime prediction for concurrent GPGPU kernels
暂无分享,去创建一个
R. Govindarajan | Matthew J. Thazhuthaveetil | Sreepathi Pai | M. J. Thazhuthaveetil | R. Govindarajan | Sreepathi Pai
[1] Richard W. Vuduc,et al. A performance analysis framework for identifying potential benefits in GPGPU applications , 2012, PPoPP '12.
[2] Mattan Erez,et al. The dual-path execution model for efficient GPU control flow , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).
[3] William Gropp,et al. An adaptive performance modeling tool for GPU architectures , 2010, PPoPP '10.
[4] Wen-mei W. Hwu,et al. Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing , 2012 .
[5] J. Xu. OpenCL – The Open Standard for Parallel Programming of Heterogeneous Systems , 2009 .
[6] K. Srinathan,et al. A performance prediction model for the CUDA GPGPU platform , 2009, 2009 International Conference on High Performance Computing (HiPC).
[7] Shinpei Kato,et al. TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments , 2011, USENIX Annual Technical Conference.
[8] Tajana Simunic,et al. Temperature aware thread block scheduling in GPGPUs , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).
[9] Xipeng Shen,et al. A cross-input adaptive framework for GPU program optimizations , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[10] Scott A. Mahlke,et al. Adaptive input-aware compilation for graphics engines , 2012, PLDI '12.
[11] Nam Sung Kim,et al. The case for GPGPU spatial multitasking , 2012, IEEE International Symposium on High-Performance Comp Architecture.
[12] Kevin Skadron,et al. Enabling Task Parallelism in the CUDA Scheduler , 2009 .
[13] Srimat T. Chakradhar,et al. Supporting GPU sharing in cloud environments with a transparent runtime consolidation framework , 2011, HPDC '11.
[14] References , 1971 .
[15] Xiaoyuan Li,et al. Guided Region-Based GPU Scheduling: Utilizing Multi-thread Parallelism to Hide Memory Latency , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[16] Mahmut T. Kandemir,et al. Neither more nor less: Optimizing thread-level parallelism for GPGPUs , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.
[17] Margaret Martonosi,et al. Reducing GPU offload latency via fine-grained CPU-GPU synchronization , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).
[18] Hyesoon Kim,et al. Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[19] R. Govindarajan,et al. Improving GPGPU concurrency with elastic kernels , 2013, ASPLOS '13.
[20] Gregory Diamos,et al. Harmony: an execution model and runtime for heterogeneous many core systems , 2008, HPDC '08.
[21] Kevin Skadron,et al. Dynamic Heterogeneous Scheduling Decisions Using Historical Runtime Data , 2011 .
[22] Stijn Eyerman,et al. System-Level Performance Metrics for Multiprogram Workloads , 2008, IEEE Micro.
[23] Laxmi N. Bhuyan,et al. A dynamic self-scheduling scheme for heterogeneous multiprocessor architectures , 2013, TACO.
[24] Hamid Laga,et al. CUDA (Computer Unified Device Architecture) , 2009 .
[25] Henry Wong,et al. Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[26] Michael J. Schulte,et al. ERCBench: An Open-Source Benchmark Suite for Embedded and Reconfigurable Computing , 2010, 2010 International Conference on Field Programmable Logic and Applications.
[27] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[28] Kevin Skadron,et al. Fine-grained resource sharing for concurrent GPGPU kernels , 2012, HotPar'12.
[29] Grigori Fursin,et al. Predictive Runtime Code Scheduling for Heterogeneous Architectures , 2008, HiPEAC.
[30] Mahmut T. Kandemir,et al. OWL: cooperative thread array aware scheduling techniques for improving GPGPU performance , 2013, ASPLOS '13.
[31] Margaret Martonosi,et al. Stargazer: Automated regression-based GPU design space exploration , 2012, 2012 IEEE International Symposium on Performance Analysis of Systems & Software.
[32] John Kim,et al. Improving GPGPU resource utilization through alternative thread block scheduling , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).