EffiSha: A Software Framework for Enabling Effficient Preemptive Scheduling of GPU
暂无分享,去创建一个
Yue Zhao | Guoyang Chen | Xipeng Shen | Huiyang Zhou | Xipeng Shen | Huiyang Zhou | Yue Zhao | Guoyang Chen
[1] Dong Li,et al. Enabling and Exploiting Flexible Task Assignment on GPU through SM-Centric Program Transformations , 2015, ICS.
[2] Long Chen,et al. Dynamic load balancing on single- and multi-GPU systems , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[3] Stijn Eyerman,et al. System-Level Performance Metrics for Multiprogram Workloads , 2008, IEEE Micro.
[4] Jianlong Zhong,et al. Kernelet: High-Throughput GPU Kernel Executions with Dynamic Slicing and Scheduling , 2013, IEEE Transactions on Parallel and Distributed Systems.
[5] Idit Keidar,et al. GPUfs: Integrating a file system with GPUs , 2013, TOCS.
[6] Andrew S. Tanenbaum,et al. Modern Operating Systems , 1992 .
[7] Dong Li,et al. PORPLE: An Extensible Optimizer for Portable Data Placement on GPU , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[8] Wu-chun Feng,et al. Inter-block GPU communication via fast barrier synchronization , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[9] Rami G. Melhem,et al. Simultaneous Multikernel GPU: Multi-tasking throughput processors via fine-grained sharing , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[10] Quan Chen,et al. Baymax: QoS Awareness and Increased Utilization for Non-Preemptive Accelerators in Warehouse Scale Computers , 2016, ASPLOS.
[11] Michael L. Scott,et al. Disengaged scheduling for fair, protected access to fast computational accelerators , 2014, ASPLOS.
[12] Jeff A. Stuart,et al. A study of Persistent Threads style GPU programming for GPGPU workloads , 2012, 2012 Innovative Parallel Computing (InPar).
[13] Cong Liu,et al. GPES: a preemptive execution system for GPGPU computing , 2015, 21st IEEE Real-Time and Embedded Technology and Applications Symposium.
[14] Guoyang Chen,et al. Coherence-Free Multiview: Enabling Reference-Discerning Data Placement on GPU , 2016, ICS.
[15] Guoyang Chen,et al. Free launch: Optimizing GPU dynamic kernel launches through thread reuse , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[16] Shinpei Kato,et al. GPUvm: Why Not Virtualizing GPUs at the Hypervisor? , 2014, USENIX Annual Technical Conference.
[17] Yi Yang,et al. CUDA-NP: Realizing Nested Thread-Level Parallelism in GPGPU Applications , 2015, Journal of Computer Science and Technology.
[18] R. Govindarajan,et al. Improving GPGPU concurrency with elastic kernels , 2013, ASPLOS '13.
[19] Shinpei Kato,et al. GDM: device memory management for gpgpu computing , 2014, SIGMETRICS '14.
[20] Dong Li,et al. Optimizing Data Placement on GPU Memory: A Portable Approach , 2017, IEEE Transactions on Computers.
[21] Shinpei Kato,et al. Gdev: First-Class GPU Resource Management in the Operating System , 2012, USENIX Annual Technical Conference.
[22] Dieter Schmalstieg,et al. Whippletree , 2014, ACM Trans. Graph..
[23] Timo Aila,et al. Understanding the efficiency of ray traversal on GPUs , 2009, High Performance Graphics.
[24] Nam Sung Kim,et al. The case for GPGPU spatial multitasking , 2012, IEEE International Symposium on High-Performance Comp Architecture.
[25] Anjul Patney,et al. Task management for irregular-parallel workloads on the GPU , 2010, HPG '10.
[26] Scott A. Mahlke,et al. Chimera: Collaborative Preemption for Multitasking on a Shared GPU , 2015, ASPLOS.
[27] Roberto Di Pietro,et al. CUDA Leaks , 2013, ACM Trans. Embed. Comput. Syst..
[28] Kyoung-Don Kang,et al. Supporting Preemptive Task Executions and Memory Copies in GPGPUs , 2012, 2012 24th Euromicro Conference on Real-Time Systems.
[29] Zhen Lin,et al. Enabling Efficient Preemption for SIMT Architectures with Lightweight Context Switching , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.
[30] Mateo Valero,et al. Enabling preemptive multiprogramming on GPUs , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).
[31] Collin McCurdy,et al. Scalable Heterogeneous Computing (SHOC) Benchmark Suite, Version 0.8 , 2009 .
[32] Collin McCurdy,et al. The Scalable Heterogeneous Computing (SHOC) benchmark suite , 2010, GPGPU-3.