Performance characterization, prediction, and optimization for heterogeneous systems with multi-level memory interference
暂无分享,去创建一个
[1] Margaret Martonosi,et al. Reducing GPU offload latency via fine-grained CPU-GPU synchronization , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).
[2] Stijn Eyerman,et al. System-Level Performance Metrics for Multiprogram Workloads , 2008, IEEE Micro.
[3] Kevin Skadron,et al. Pannotia: Understanding irregular GPGPU graph applications , 2013, 2013 IEEE International Symposium on Workload Characterization (IISWC).
[4] Xiaojin Zhu,et al. Cross-architecture performance prediction (XAPP) using CPU code to predict GPU performance , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[5] Kim M. Hazelwood,et al. Where is the data? Why you cannot debate CPU vs. GPU performance without the answer , 2011, (IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE.
[6] Kevin Skadron,et al. A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads , 2010, IEEE International Symposium on Workload Characterization (IISWC'10).
[7] Michael F. P. O'Boyle,et al. Merge or Separate?: Multi-job Scheduling for OpenCL Kernels on CPU/GPU Platforms , 2017, GPGPU@PPoPP.
[8] Michael F. P. O'Boyle,et al. Smart multi-task scheduling for OpenCL programs on CPU/GPU heterogeneous platforms , 2014, 2014 21st International Conference on High Performance Computing (HiPC).
[9] M. Martonosi,et al. Adaptive timekeeping replacement: Fine-grained capacity management for shared CMP caches , 2011, TACO.
[10] Michael F. P. O'Boyle,et al. Portable and transparent software managed scheduling on accelerators for fair resource sharing , 2016, 2016 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[11] Tarek S. Abdelrahman,et al. Launch-Time Optimization of OpenCL GPU Kernels , 2017, GPGPU@PPoPP.
[12] Srihari Makineni,et al. Communist, Utilitarian, and Capitalist cache policies on CMPs: Caches as a shared resource , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[13] Ján Veselý,et al. Observations and opportunities in architecting shared virtual memory for heterogeneous systems , 2016, 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[14] Yale N. Patt,et al. Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[15] Hyesoon Kim,et al. TAP: A TLP-aware cache management policy for a CPU-GPU heterogeneous architecture , 2012, IEEE International Symposium on High-Performance Comp Architecture.
[16] Aamer Jaleel,et al. Adaptive insertion policies for managing shared caches , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[17] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[18] R. Govindarajan,et al. Improving GPGPU concurrency with elastic kernels , 2013, ASPLOS '13.
[19] Kevin Kai-Wei Chang,et al. Staged memory scheduling: Achieving high performance and scalability in heterogeneous systems , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).
[20] Lieven Eeckhout,et al. Scheduling heterogeneous multi-cores through performance impact estimation (PIE) , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).
[21] Gabriel H. Loh,et al. PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches , 2009, ISCA '09.
[22] Kevin Skadron,et al. Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[23] Tao Li,et al. Informed Microarchitecture Design Space Exploration Using Workload Dynamics , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).
[24] Carole-Jean Wu,et al. Characterization and dynamic mitigation of intra-application cache interference , 2011, (IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE.
[25] Antonia Zhai,et al. Managing shared last-level cache in a heterogeneous multicore processor , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.
[26] Andrew Siegel,et al. XSBENCH - THE DEVELOPMENT AND VERIFICATION OF A PERFORMANCE ABSTRACTION FOR MONTE CARLO REACTOR ANALYSIS , 2014 .
[27] Onur Mutlu,et al. Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).
[28] Eduard Ayguadé,et al. Evaluating the effect of last-level cache sharing on integrated GPU-CPU systems with heterogeneous applications , 2016, 2016 IEEE International Symposium on Workload Characterization (IISWC).
[29] Mohamed Zahran,et al. Heterogeneous Computing: Here to Stay , 2016, ACM Queue.
[30] Lizhong Chen,et al. Futility Scaling: High-Associativity Cache Partitioning , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[31] Xiangyu Li,et al. Hetero-mark, a benchmark suite for CPU-GPU collaborative computing , 2016, 2016 IEEE International Symposium on Workload Characterization (IISWC).
[32] Derek Chiou,et al. GPGPU performance and power estimation using machine learning , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).
[33] Mahmut T. Kandemir,et al. Managing GPU Concurrency in Heterogeneous Architectures , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[34] R. Dennis Cook,et al. Cross-Validation of Regression Models , 1984 .
[35] Collin McCurdy,et al. The Scalable Heterogeneous Computing (SHOC) benchmark suite , 2010, GPGPU-3.
[36] Aamer Jaleel,et al. CRUISE: cache replacement and utility-aware scheduling , 2012, ASPLOS XVII.
[37] Laxmi N. Bhuyan,et al. CuMAS: Data Transfer Aware Multi-Application Scheduling for Shared GPUs , 2016, ICS.