Analyzing and Estimating the Performance of Concurrent Kernels Execution on GPUs
暂无分享,去创建一个
Cristiana Bentes | Esteban Clua | Rommel Cruz | E. Clua | C. Bentes | Lúcia M. A. Drummond | Lucia Drummond | R. Cruz
[1] T. Steinke,et al. On Improving the Performance of Multi-threaded CUDA Applications with Concurrent Kernel Execution by Kernel Reordering , 2012, 2012 Symposium on Application Accelerators in High Performance Computing.
[2] Mahmut T. Kandemir,et al. Anatomy of GPU Memory System for Multi-Application Execution , 2015, MEMSYS.
[3] Youyou Lu,et al. Run-Time Performance Estimation and Fairness-Oriented Scheduling Policy for Concurrent GPGPU Applications , 2016, 2016 45th International Conference on Parallel Processing (ICPP).
[4] Mateo Valero,et al. Enabling preemptive multiprogramming on GPUs , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).
[5] David Black-Schaffer,et al. Partitioning GPUs for Improved Scalability , 2016, 2016 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD).
[6] Shinpei Kato,et al. GPUvm: Why Not Virtualizing GPUs at the Hypervisor? , 2014, USENIX Annual Technical Conference.
[7] R. Govindarajan,et al. Improving GPGPU concurrency with elastic kernels , 2013, ASPLOS '13.
[8] Scott A. Mahlke,et al. Chimera: Collaborative Preemption for Multitasking on a Shared GPU , 2015, ASPLOS.
[9] Tao Li,et al. Exploring GPGPU workloads: Characterization methodology, analysis and microarchitecture evaluation implications , 2010, IEEE International Symposium on Workload Characterization (IISWC'10).
[10] Onur Mutlu,et al. The application slowdown model: Quantifying and controlling the impact of inter-application interference at shared caches and main memory , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[11] Zhongliang Chen,et al. NUPAR: A Benchmark Suite for Modern GPU Architectures , 2015, ICPE.
[12] Nam Sung Kim,et al. The case for GPGPU spatial multitasking , 2012, IEEE International Symposium on High-Performance Comp Architecture.
[13] Mattan Erez,et al. A QoS-aware memory controller for dynamically balancing GPU and CPU bandwidth use in an MPSoC , 2012, DAC Design Automation Conference 2012.
[14] Kevin Skadron,et al. A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads , 2010, IEEE International Symposium on Workload Characterization (IISWC'10).
[15] Ben H. H. Juurlink,et al. GPGPU workload characteristics and performance analysis , 2014, 2014 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIV).
[16] Nam Sung Kim,et al. Fair share: Allocation of GPU resources for both performance and fairness , 2014, 2014 IEEE 32nd International Conference on Computer Design (ICCD).
[17] Rachata Ausavarungnirun,et al. Techniques for Shared Resource Management in Systems with Throughput Processors , 2018, ArXiv.
[18] Vikram K. Narayana,et al. GPU Resource Sharing and Virtualization on High Performance Computing Systems , 2011, 2011 International Conference on Parallel Processing.
[19] Vikram K. Narayana,et al. A Power-Aware Symbiotic Scheduling Algorithm for Concurrent GPU Kernels , 2015, 2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS).