Staged memory scheduling: Achieving high performance and scalability in heterogeneous systems
暂无分享,去创建一个
Kevin Kai-Wei Chang | Rachata Ausavarungnirun | Onur Mutlu | Gabriel H. Loh | Lavanya Subramanian | O. Mutlu | Rachata Ausavarungnirun | Lavanya Subramanian | K. Chang | G. Loh
[1] James E. Smith,et al. Complexity-Effective Superscalar Processors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[2] Sally A. McKee,et al. Dynamic Access Ordering for Streamed Computations , 2000, IEEE Trans. Computers.
[3] A. Snavely,et al. Symbiotic jobscheduling for a simultaneous mutlithreading processor , 2000, SIGP.
[4] William J. Dally,et al. Memory access scheduling , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[5] Rajiv Kapoor,et al. Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).
[6] Harish Patil,et al. Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.
[7] Zhao Zhang,et al. A performance comparison of DRAM memory system optimizations for SMT processors , 2005, 11th International Symposium on High-Performance Computer Architecture.
[8] James E. Smith,et al. Fair Queuing Memory Systems , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[9] Onur Mutlu,et al. Memory Performance Attacks: Denial of Memory Service in Multi-Core Systems , 2007, USENIX Security Symposium.
[10] Onur Mutlu,et al. Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).
[11] Tao Li,et al. Informed Microarchitecture Design Space Exploration Using Workload Dynamics , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).
[12] Won-Taek Lim,et al. Effective Management of DRAM Bandwidth in Multicore Processors , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).
[13] Erik Lindholm,et al. NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.
[14] Onur Mutlu,et al. Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems , 2008, 2008 International Symposium on Computer Architecture.
[15] Stijn Eyerman,et al. System-Level Performance Metrics for Multiprogram Workloads , 2008, IEEE Micro.
[16] Onur Mutlu,et al. Self-Optimizing Memory Controllers: A Reinforcement Learning Approach , 2008, 2008 International Symposium on Computer Architecture.
[17] Chita R. Das,et al. Application-aware prioritization mechanisms for on-chip networks , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[18] N. H. Kim. Effect of Instruction Fetch and Memory Scheduling on GPU Performance , 2009 .
[19] Tor M. Aamodt,et al. Complexity effective memory access scheduling for many-core accelerator architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[20] Mor Harchol-Balter,et al. ATLAS: A scalable and high-performance scheduling algorithm for multiple memory controllers , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.
[21] Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems , 2010, ASPLOS XV.
[22] Pradeep Dubey,et al. Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU , 2010, ISCA.
[23] Mor Harchol-Balter,et al. Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.
[24] Mark Silberstein,et al. PTask: operating system abstractions to manage GPUs as compute devices , 2011, SOSP.
[25] Sai Prashanth Muralidhara,et al. Reducing memory interference in multicore systems via application-aware memory channel partitioning , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[26] Brad Burgess,et al. Bobcat: AMD's Low-Power x86 Processor , 2011, IEEE Micro.
[27] Chris Fallin,et al. Parallel application memory scheduling , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[28] Hans Vandierendonck,et al. Fairness Metrics for Multi-Threaded Processors , 2011, IEEE Computer Architecture Letters.
[29] Lizy Kurian John,et al. Minimalist open-page: A DRAM page-mode scheduling policy for the many-core era , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[30] Mattan Erez,et al. A QoS-aware memory controller for dynamically balancing GPU and CPU bandwidth use in an MPSoC , 2012, DAC Design Automation Conference 2012.
[31] Jinwoo Shin,et al. DRAM Scheduling Policy for GPGPU Architectures Based on a Potential Function , 2012, IEEE Computer Architecture Letters.