SMGuard: A Flexible and Fine-Grained Resource Management Framework for GPUs
暂无分享,去创建一个
Depei Qian | Yuebin Bai | Yuhao Gu | Hailong Yang | Kun Cheng | Zhongzhi Luan | Chao Yu | D. Qian | Hailong Yang | Zhongzhi Luan | Yuhao Gu | Yuebin Bai | Chao Yu | Kun Cheng
[1] Dong Li,et al. Enabling and Exploiting Flexible Task Assignment on GPU through SM-Centric Program Transformations , 2015, ICS.
[2] Kevin Skadron,et al. Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[3] Jianlong Zhong,et al. Kernelet: High-Throughput GPU Kernel Executions with Dynamic Slicing and Scheduling , 2013, IEEE Transactions on Parallel and Distributed Systems.
[4] Christoforos E. Kozyrakis,et al. Vantage: Scalable and efficient fine-grain cache partitioning , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).
[5] Ian Karlin,et al. LULESH Programming Model and Performance Ports Overview , 2012 .
[6] Henry Wong,et al. Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[7] Yue Zhao,et al. EffiSha: A Software Framework for Enabling Effficient Preemptive Scheduling of GPU , 2017, PPoPP.
[8] Scott A. Mahlke,et al. Chimera: Collaborative Preemption for Multitasking on a Shared GPU , 2015, ASPLOS.
[9] Zhen Lin,et al. Enabling Efficient Preemption for SIMT Architectures with Lightweight Context Switching , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.
[10] Mateo Valero,et al. Enabling preemptive multiprogramming on GPUs , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).
[11] Lingjia Tang,et al. Bubble-flux: precise online QoS management for increased utilization in warehouse scale computers , 2013, ISCA.
[12] Wenguang Chen,et al. VersaPipe: A Versatile Programming Framework for Pipelined Computing on GPU , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[13] Collin McCurdy,et al. The Scalable Heterogeneous Computing (SHOC) benchmark suite , 2010, GPGPU-3.
[14] James H. Anderson,et al. GPUSync: A Framework for Real-Time GPU Management , 2013, 2013 IEEE 34th Real-Time Systems Symposium.
[15] Christoforos E. Kozyrakis,et al. Reconciling high server utilization and sub-millisecond quality-of-service , 2014, EuroSys '14.
[16] Quan Chen,et al. Prophet: Precise QoS Prediction on Non-Preemptive Accelerators to Improve Utilization in Warehouse-Scale Computers , 2017, ASPLOS.
[17] Nam Sung Kim,et al. QoS-aware dynamic resource allocation for spatial-multitasking GPUs , 2014, 2014 19th Asia and South Pacific Design Automation Conference (ASP-DAC).
[18] Jeff A. Stuart,et al. A study of Persistent Threads style GPU programming for GPGPU workloads , 2012, 2012 Innovative Parallel Computing (InPar).
[19] Shinpei Kato,et al. TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments , 2011, USENIX Annual Technical Conference.
[20] Shinpei Kato,et al. Real-Time GPU Resource Management with Loadable Kernel Modules , 2017, IEEE Transactions on Parallel and Distributed Systems.
[21] Mark W. Krentel. Libmonitor: A tool for first-party monitoring , 2013, Parallel Comput..
[22] Depei Qian,et al. Scheduling Tasks with Mixed Timing Constraints in GPU-Powered Real-Time Systems , 2016, ICS.
[23] Jeffrey K. Hollingsworth,et al. An API for Runtime Code Patching , 2000, Int. J. High Perform. Comput. Appl..
[24] Daniel Sánchez,et al. Ubik: efficient cache sharing with strict qos for latency-critical workloads , 2014, ASPLOS.
[25] Kyoung-Don Kang,et al. Supporting Preemptive Task Executions and Memory Copies in GPGPUs , 2012, 2012 24th Euromicro Conference on Real-Time Systems.
[26] Shinpei Kato,et al. Gdev: First-Class GPU Resource Management in the Operating System , 2012, USENIX Annual Technical Conference.
[27] Shinpei Kato,et al. RGEM: A Responsive GPGPU Execution Model for Runtime Engines , 2011, 2011 IEEE 32nd Real-Time Systems Symposium.
[28] Cong Liu,et al. GPES: a preemptive execution system for GPGPU computing , 2015, 21st IEEE Real-Time and Embedded Technology and Applications Symposium.
[29] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[30] Christina Delimitrou,et al. Quasar: resource-efficient and QoS-aware cluster management , 2014, ASPLOS.
[31] Quan Chen,et al. Baymax: QoS Awareness and Increased Utilization for Non-Preemptive Accelerators in Warehouse Scale Computers , 2016, ASPLOS.
[32] Xi Yang,et al. Elfen Scheduling: Fine-Grain Principled Borrowing from Latency-Critical Workloads Using Simultaneous Multithreading , 2016, USENIX Annual Technical Conference.
[33] Mattan Erez,et al. Dirigent: Enforcing QoS for Latency-Critical Tasks on Shared Multicore Systems , 2016, ASPLOS.
[34] Nam Sung Kim,et al. The case for GPGPU spatial multitasking , 2012, IEEE International Symposium on High-Performance Comp Architecture.
[35] Stijn Eyerman,et al. System-Level Performance Metrics for Multiprogram Workloads , 2008, IEEE Micro.
[36] Changjun Jiang,et al. FLEP: Enabling Flexible and Efficient Preemption on GPUs , 2017, ASPLOS.