Quality of service support for fine-grained sharing on GPUs
暂无分享,去创建一个
Rami G. Melhem | Minyi Guo | Youtao Zhang | Jun Yang | Bruce R. Childers | Zhenning Wang | Jun Yang | Youtao Zhang | M. Guo | R. Melhem | B. Childers | Zhenning Wang
[1] Rami G. Melhem,et al. SAWS: Synchronization aware GPGPU warp scheduling for multiple independent warp schedulers , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[2] Won Woo Ro,et al. Warped-Slicer: Efficient Intra-SM Slicing through Dynamic Resource Partitioning for GPU Multiprogramming , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[3] Rami G. Melhem,et al. Simultaneous Multikernel: Fine-Grained Sharing of GPUs , 2016, IEEE Computer Architecture Letters.
[4] Quan Chen,et al. DjiNN and Tonic: DNN as a service and its implications for future warehouse scale computers , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[5] Nam Sung Kim,et al. The case for GPGPU spatial multitasking , 2012, IEEE International Symposium on High-Performance Comp Architecture.
[6] Mikhail Bautin,et al. Graphic engine resource management , 2008, Electronic Imaging.
[7] R. Govindarajan,et al. Improving GPGPU concurrency with elastic kernels , 2013, ASPLOS '13.
[8] Kevin Skadron,et al. Fine-grained resource sharing for concurrent GPGPU kernels , 2012, HotPar'12.
[9] Jianlong Zhong,et al. Medusa: Simplified Graph Processing on GPUs , 2014, IEEE Transactions on Parallel and Distributed Systems.
[10] Harrick M. Vin,et al. A hierarchial CPU scheduler for multimedia operating systems , 1996, OSDI '96.
[11] John Kim,et al. Improving GPGPU resource utilization through alternative thread block scheduling , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[12] Michael L. Scott,et al. Enabling OS Research by Inferring Interactions in the Black-Box GPU Stack , 2013, USENIX Annual Technical Conference.
[13] Henry Wong,et al. Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[14] R. Govindarajan,et al. Preemptive thread block scheduling with online structural runtime prediction for concurrent GPGPU kernels , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).
[15] Naga K. Govindaraju,et al. Mars: A MapReduce Framework on graphics processors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[16] David R. Cheriton,et al. Borrowed-virtual-time (BVT) scheduling: supporting latency-sensitive threads in a general-purpose scheduler , 1999, OPSR.
[17] Xiangyu Li,et al. Mystic: Predictive Scheduling for GPU Based Cloud Servers Using Machine Learning , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[18] Scott A. Mahlke,et al. Equalizer: Dynamic Tuning of GPU Resources for Efficient Execution , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[19] Nam Sung Kim,et al. Fair share: Allocation of GPU resources for both performance and fairness , 2014, 2014 IEEE 32nd International Conference on Computer Design (ICCD).
[20] Rami G. Melhem,et al. Simultaneous Multikernel GPU: Multi-tasking throughput processors via fine-grained sharing , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[21] Mahmut T. Kandemir,et al. Exploiting Core Criticality for Enhanced GPU Performance , 2016, SIGMETRICS.
[22] Harrick M. Vin,et al. A hierarchial CPU scheduler for multimedia operating systems , 1996, OSDI '96.
[23] Mateo Valero,et al. Enabling preemptive multiprogramming on GPUs , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).
[24] Mahmut T. Kandemir,et al. Managing GPU Concurrency in Heterogeneous Architectures , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[25] Wei Yi,et al. Kernel Fusion: An Effective Method for Better Power Efficiency on Multithreaded GPU , 2010, 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing.
[26] Prashant J. Shenoy,et al. Surplus fair scheduling: a proportional-share CPU scheduling algorithm for symmetric multiprocessors , 2000, OSDI.
[27] Scott Shenker,et al. Analysis and simulation of a fair queueing algorithm , 1989, SIGCOMM '89.
[28] Jiaxing Zhang,et al. Minerva: A Scalable and Highly Efficient Training Platform for Deep Learning , 2014 .
[29] Yin Wang,et al. VGRIS: Virtualized GPU Resource Isolation and Scheduling in Cloud Gaming , 2013, TACO.
[30] Quan Chen,et al. Baymax: QoS Awareness and Increased Utilization for Non-Preemptive Accelerators in Warehouse Scale Computers , 2016, ASPLOS.
[31] Michael L. Scott,et al. Disengaged scheduling for fair, protected access to fast computational accelerators , 2014, ASPLOS.
[32] Peter Kulchyski. and , 2015 .
[33] Nam Sung Kim,et al. GPUWattch: enabling energy optimizations in GPGPUs , 2013, ISCA.
[34] Klara Nahrstedt,et al. Energy-efficient soft real-time CPU scheduling for mobile multimedia systems , 2003, SOSP '03.
[35] Shinpei Kato,et al. TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments , 2011, USENIX Annual Technical Conference.
[36] Mahmut T. Kandemir,et al. OWL: cooperative thread array aware scheduling techniques for improving GPGPU performance , 2013, ASPLOS '13.
[37] Michael F. P. O'Boyle,et al. Portable and transparent software managed scheduling on accelerators for fair resource sharing , 2016, 2016 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[38] Nam Sung Kim,et al. QoS-aware dynamic resource allocation for spatial-multitasking GPUs , 2014, 2014 19th Asia and South Pacific Design Automation Conference (ASP-DAC).
[39] George Varghese,et al. Efficient fair queueing using deficit round robin , 1995, SIGCOMM '95.
[40] Hussein M. Abdel-Wahab,et al. A proportional share resource allocation algorithm for real-time, time-shared systems , 1996, 17th IEEE Real-Time Systems Symposium.
[41] Scott A. Mahlke,et al. Chimera: Collaborative Preemption for Multitasking on a Shared GPU , 2015, ASPLOS.
[42] Mark Silberstein,et al. PTask: operating system abstractions to manage GPUs as compute devices , 2011, SOSP.
[43] Dong Li,et al. Enabling and Exploiting Flexible Task Assignment on GPU through SM-Centric Program Transformations , 2015, ICS.
[44] Wen-mei W. Hwu,et al. Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing , 2012 .