Classification-Driven Search for Effective SM Partitioning in Multitasking GPUs
暂无分享,去创建一个
[1] Rachata Ausavarungnirun,et al. Mosaic: A GPU Memory Manager with Application-Transparent Support for Multiple Page Sizes , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[2] Nam Sung Kim,et al. The case for GPGPU spatial multitasking , 2012, IEEE International Symposium on High-Performance Comp Architecture.
[3] Nanning Zheng,et al. Accelerate GPU Concurrent Kernel Execution by Mitigating Memory Pipeline Stalls , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[4] Amin Jadidi. Kernel-Based Energy Optimization In GPUs , 2015 .
[5] Won Woo Ro,et al. Warped-Slicer: Efficient Intra-SM Slicing through Dynamic Resource Partitioning for GPU Multiprogramming , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[6] Nam Sung Kim,et al. GPUWattch: enabling energy optimizations in GPGPUs , 2013, ISCA.
[7] Nam Sung Kim,et al. GPU register file virtualization , 2015, MICRO.
[8] Nanning Zheng,et al. POSTER: Accelerate GPU Concurrent Kernel Execution by Mitigating Memory Pipeline Stalls , 2017, 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[9] Quan Chen,et al. Baymax: QoS Awareness and Increased Utilization for Non-Preemptive Accelerators in Warehouse Scale Computers , 2016, ASPLOS.
[10] Naga K. Govindaraju,et al. Mars: A MapReduce Framework on graphics processors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[11] Hyesoon Kim,et al. TAP: A TLP-aware cache management policy for a CPU-GPU heterogeneous architecture , 2012, IEEE International Symposium on High-Performance Comp Architecture.
[12] Srimat T. Chakradhar,et al. Supporting GPU sharing in cloud environments with a transparent runtime consolidation framework , 2011, HPDC '11.
[13] Mohamed Ibrahim,et al. Efficient and Fair Multi-programming in GPUs via Effective Bandwidth Management , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[14] Mahmut T. Kandemir,et al. Anatomy of GPU Memory System for Multi-Application Execution , 2015, MEMSYS.
[15] Nam Sung Kim,et al. Fair share: Allocation of GPU resources for both performance and fairness , 2014, 2014 IEEE 32nd International Conference on Computer Design (ICCD).
[16] Wen-mei W. Hwu,et al. Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing , 2012 .
[17] Shinpei Kato,et al. GPUvm: Why Not Virtualizing GPUs at the Hypervisor? , 2014, USENIX Annual Technical Conference.
[18] Xiuhong Li,et al. Efficient kernel management on GPUs , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[19] Henry Wong,et al. Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[20] Lifan Xu,et al. Auto-tuning a high-level language targeted to GPU codes , 2012, 2012 Innovative Parallel Computing (InPar).
[21] 日経BP社,et al. Amazon Web Services完全ソリューションガイド , 2016 .
[22] Rami G. Melhem,et al. Simultaneous Multikernel GPU: Multi-tasking throughput processors via fine-grained sharing , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[23] Amin Jadidi,et al. Optimizing energy consumption in GPUS through feedback-driven CTA scheduling , 2017, SpringSim.
[24] Michael F. P. O'Boyle,et al. Portable and transparent software managed scheduling on accelerators for fair resource sharing , 2016, 2016 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[25] Onur Mutlu,et al. Zorua: A holistic approach to resource virtualization in GPUs , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[26] Scott A. Mahlke,et al. Dynamic Resource Management for Efficient Utilization of Multitasking GPUs , 2017, ASPLOS.
[27] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[28] Mateo Valero,et al. Enabling preemptive multiprogramming on GPUs , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).
[29] Scott A. Mahlke,et al. Chimera: Collaborative Preemption for Multitasking on a Shared GPU , 2015, ASPLOS.
[30] Joseph Zambreno,et al. Increasing GPU throughput using kernel interleaved thread block scheduling , 2013, 2013 IEEE 31st International Conference on Computer Design (ICCD).
[31] Stijn Eyerman,et al. System-Level Performance Metrics for Multiprogram Workloads , 2008, IEEE Micro.