Decoupling Contention with Victim Row-Buffer on Multicore Memory Systems
暂无分享,去创建一个
Jie Wu | Dongrui Fan | Ke Gao | Zhiyong Liu
[1] Mor Harchol-Balter,et al. ATLAS : A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers , 2010 .
[2] Sai Prashanth Muralidhara,et al. Reducing memory interference in multicore systems via application-aware memory channel partitioning , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[3] Scott Rixner,et al. Memory Controller Optimizations for Web Servers , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).
[4] Hideto Hidaka,et al. The cache DRAM architecture: a DRAM with an on-chip cache memory , 1990, IEEE Micro.
[5] Mor Harchol-Balter,et al. Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.
[6] Kevin Kai-Wei Chang,et al. Staged memory scheduling: Achieving high performance and scalability in heterogeneous systems , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).
[7] Xiao Zhang,et al. Towards practical page coloring-based multicore cache management , 2009, EuroSys '09.
[8] Hyesoon Kim,et al. TAP: A TLP-aware cache management policy for a CPU-GPU heterogeneous architecture , 2012, IEEE International Symposium on High-Performance Comp Architecture.
[9] Manoj Franklin,et al. Balancing thoughput and fairness in SMT processors , 2001, 2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS..
[10] Zhao Zhang,et al. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.
[11] Doe Hyun Yoon,et al. The dynamic granularity memory system , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).
[12] Shunfei Chen,et al. MARSS: A full system simulator for multicore x86 CPUs , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).
[13] Xiaoning Ding,et al. SRM-buffer: an OS buffer management technique to prevent last level cache from thrashing in multicores , 2011, EuroSys '11.
[14] Tao Li,et al. Informed Microarchitecture Design Space Exploration Using Workload Dynamics , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).
[15] Zhao Zhang,et al. Mini-rank: Adaptive DRAM architecture for improving memory power efficiency , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.
[16] Chris Fallin,et al. Parallel application memory scheduling , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[17] Christoforos E. Kozyrakis,et al. Improving System Energy Efficiency with Memory Rank Subsetting , 2012, TACO.
[18] Lei Liu,et al. A software memory partition approach for eliminating bank-level interference in multicore systems , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).
[19] Dam Sunwoo,et al. Balancing DRAM locality and parallelism in shared memory CMP systems , 2012, IEEE International Symposium on High-Performance Comp Architecture.
[20] Yale N. Patt,et al. Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[21] Krste Asanovic,et al. Victim replication: maximizing capacity while hiding wire delay in tiled chip multiprocessors , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[22] Norman P. Jouppi,et al. Rethinking DRAM design and organization for energy-constrained multi-cores , 2010, ISCA.
[23] Zhao Zhang,et al. Cached DRAM for ILP Processor Memory Access Latency Reduction , 2001, IEEE Micro.
[24] Onur Mutlu,et al. Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems , 2008, 2008 International Symposium on Computer Architecture.
[25] José F. Martínez,et al. Improving memory scheduling via processor-side load criticality information , 2013, ISCA.
[26] Jung Ho Ahn,et al. A Comprehensive Memory Modeling Tool and Its Application to the Design and Analysis of Future Memory Hierarchies , 2008, 2008 International Symposium on Computer Architecture.
[27] Jongmoo Choi,et al. Regularities considered harmful: forcing randomness to memory accesses to reduce row buffer conflicts for multi-core, multi-bank systems , 2013, ASPLOS '13.
[28] Onur Mutlu,et al. Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).
[29] Peng Li,et al. Heterogeneous multi-channel: Fine-grained DRAM control for both system performance and power efficiency , 2012, DAC Design Automation Conference 2012.
[30] Dongrui Fan,et al. Efficient Address Mapping of Shared Cache for On-Chip Many-Core Architecture , 2010, Euro-Par.
[31] Gershon Kedem,et al. WCDRAM: A fully associative integrated Cached-DRAM with wide cache lines , 1997 .
[32] Lizy Kurian John,et al. Minimalist open-page: A DRAM page-mode scheduling policy for the many-core era , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[33] John L. Henning. SPEC CPU2006 benchmark descriptions , 2006, CARN.
[34] Tor M. Aamodt,et al. Complexity effective memory access scheduling for many-core accelerator architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[35] Lizy Kurian John,et al. A bandwidth-aware memory-subsystem resource management using non-invasive resource profilers for large CMP systems , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.
[36] Frederick A. Ware,et al. Improving Power and Data Efficiency with Threaded Memory Modules , 2006, 2006 International Conference on Computer Design.
[37] Zhao Zhang,et al. A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality , 2000, MICRO 33.
[38] Bruce Jacob,et al. DRAMSim2: A Cycle Accurate Memory System Simulator , 2011, IEEE Computer Architecture Letters.
[39] Matt T. Yourst. PTLsim: A Cycle Accurate Full System x86-64 Microarchitectural Simulator , 2007, 2007 IEEE International Symposium on Performance Analysis of Systems & Software.
[40] Fabrice Bellard,et al. QEMU, a Fast and Portable Dynamic Translator , 2005, USENIX ATC, FREENIX Track.
[41] Xiaobing Feng,et al. Software-Hardware Cooperative DRAM Bank Partitioning for Chip Multiprocessors , 2010, NPC.