Understanding Co-run Degradations on Integrated Heterogeneous Processors
暂无分享,去创建一个
Li Shen | Qi Zhu | Zhiying Wang | Bo Wu | Xipeng Shen
[1] Michael F. P. O'Boyle,et al. A Static Task Partitioning Approach for Heterogeneous Systems Using OpenCL , 2011, CC.
[2] Hyesoon Kim,et al. Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[3] Wei Yi,et al. Kernel Fusion: An Effective Method for Better Power Efficiency on Multithreaded GPU , 2010, 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing.
[4] Jiri Filipovic,et al. Automatic fusions of CUDA-GPU kernels for parallel map , 2011, CARN.
[5] Jichuan Chang,et al. Cooperative cache partitioning for chip multiprocessors , 2007, ICS '07.
[6] Xipeng Shen,et al. Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs? , 2010, PPoPP '10.
[7] Mark S. Squillante,et al. Using Processor-Cache Affinity Information in Shared-Memory Multiprocessor Scheduling , 1993, IEEE Trans. Parallel Distributed Syst..
[8] Lingjia Tang,et al. Whare-map: heterogeneity in "homogeneous" warehouse-scale computers , 2013, ISCA.
[9] Yutao Zhong,et al. Predicting whole-program locality through reuse distance analysis , 2003, PLDI.
[10] Antonia Zhai,et al. Managing shared last-level cache in a heterogeneous multicore processor , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.
[11] Michael F. P. O'Boyle,et al. OpenCL Task Partitioning in the Presence of GPU Contention , 2013, LCPC.
[12] Won-Taek Lim,et al. Architectural support for operating system-driven CMP cache management , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[13] Michael D. Smith,et al. Improving Performance Isolation on Chip Multiprocessors via an Operating System Scheduler , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).
[14] Long Chen,et al. Dynamic load balancing on single- and multi-GPU systems , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[15] Evangelos P. Markatos,et al. Using processor affinity in loop scheduling on shared-memory multiprocessors , 1992, Supercomputing '92.
[16] Xipeng Shen,et al. A study on optimally co-scheduling jobs of different lengths on chip multiprocessors , 2009, CF '09.
[17] G. Edward Suh,et al. A new memory monitoring scheme for memory-aware scheduling and partitioning , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.
[18] Jie Chen,et al. Analysis and approximation of optimal co-scheduling on Chip Multiprocessors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[19] Gagan Agrawal,et al. Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations , 2010, ICS '10.
[20] Yale N. Patt,et al. Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[21] Sandhya Dwarkadas,et al. Compatible phase co-scheduling on a CMP of multi-threaded processors , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.
[22] John E. Stone,et al. An asymmetric distributed shared memory model for heterogeneous parallel systems , 2010, ASPLOS 2010.
[23] Xipeng Shen,et al. A cross-input adaptive framework for GPU program optimizations , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[24] Benjamin C. Lee,et al. REF: resource elasticity fairness with sharing incentives for multiprocessors , 2014, ASPLOS.
[25] Sudhakar Yalamanchili,et al. Optimizing Data Warehousing Applications for GPUs Using Kernel Fusion/Fission , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.
[26] Timo Aila,et al. Understanding the efficiency of ray traversal on GPUs , 2009, High Performance Graphics.
[27] Dean M. Tullsen,et al. Initial observations of the simultaneous multithreading Pentium 4 processor , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.
[28] Jeff A. Stuart,et al. A study of Persistent Threads style GPU programming for GPGPU workloads , 2012, 2012 Innovative Parallel Computing (InPar).
[29] Wu-chun Feng,et al. Inter-block GPU communication via fast barrier synchronization , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[30] Xipeng Shen,et al. Combining Locality Analysis with Online Proactive Job Co-scheduling in Chip Multiprocessors , 2010, HiPEAC.