Compiler Support for Optimizing Memory Bank-Level Parallelism
暂无分享,去创建一个
[1] Jongmoo Choi,et al. Regularities considered harmful: forcing randomness to memory accesses to reduce row buffer conflicts for multi-core, multi-bank systems , 2013, ASPLOS '13.
[2] Monica S. Lam,et al. A Loop Transformation Theory and an Algorithm to Maximize Parallelism , 1991, IEEE Trans. Parallel Distributed Syst..
[3] Sai Prashanth Muralidhara,et al. Reducing memory interference in multicore systems via application-aware memory channel partitioning , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[4] Mahmut T. Kandemir,et al. Reshaping cache misses to improve row-buffer locality in multicore systems , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.
[5] Zhao Zhang,et al. A performance comparison of DRAM memory system optimizations for SMT processors , 2005, 11th International Symposium on High-Performance Computer Architecture.
[6] Onur Mutlu,et al. Self-Optimizing Memory Controllers: A Reinforcement Learning Approach , 2008, 2008 International Symposium on Computer Architecture.
[7] Monica S. Lam,et al. Maximizing parallelism and minimizing synchronization with affine transforms , 1997, POPL '97.
[8] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.
[9] Mor Harchol-Balter,et al. Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.
[10] David W. Nellans,et al. Micro-pages: increasing DRAM efficiency with locality-aware data placement , 2010, ASPLOS XV.
[11] Clifford Stein,et al. Introduction to Algorithms, 2nd edition. , 2001 .
[12] Uday Bondhugula,et al. Compiler-assisted dynamic scheduling for effective parallelization of loop nests on multicore processors , 2009, PPoPP '09.
[13] Dam Sunwoo,et al. Balancing DRAM locality and parallelism in shared memory CMP systems , 2012, IEEE International Symposium on High-Performance Comp Architecture.
[14] Xiaobing Feng,et al. Software-Hardware Cooperative DRAM Bank Partitioning for Chip Multiprocessors , 2010, NPC.
[15] James H. Anderson,et al. Real-Time Scheduling on Multicore Platforms , 2006, 12th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS'06).
[16] Monica S. Lam,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.
[17] Marco Caccamo,et al. Memory-centric scheduling for multicore hard real-time systems , 2012, Real-Time Systems.
[18] François Irigoin,et al. Supernode partitioning , 1988, POPL '88.
[19] Mor Harchol-Balter,et al. ATLAS : A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers , 2010 .
[20] Christoforos E. Kozyrakis,et al. Future scaling of processor-memory interfaces , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[21] Mahmut T. Kandemir,et al. On-chip cache hierarchy-aware tile scheduling for multicore machines , 2011, International Symposium on Code Generation and Optimization (CGO 2011).
[22] Scott Rixner,et al. Memory Controller Optimizations for Web Servers , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).
[23] Lei Liu,et al. A software memory partition approach for eliminating bank-level interference in multicore systems , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).
[24] Onur Mutlu,et al. Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems , 2008, 2008 International Symposium on Computer Architecture.
[25] Xu Cheng,et al. Improving system throughput and fairness simultaneously in shared memory CMP systems via Dynamic Bank Partitioning , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[26] Michael Wolfe,et al. Iteration Space Tiling for Memory Hierarchies , 1987, PPSC.
[27] Onur Mutlu,et al. Improving memory Bank-Level Parallelism in the presence of prefetching , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[28] Onur Mutlu,et al. A Case for MLP-Aware Cache Replacement , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).
[29] S. Phadke,et al. MLP aware heterogeneous memory system , 2011, 2011 Design, Automation & Test in Europe.
[30] Zhao Zhang,et al. Mini-rank: Adaptive DRAM architecture for improving memory power efficiency , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.
[31] Xing Zhou,et al. Hierarchical overlapped tiling , 2012, CGO '12.
[32] Christoforos E. Kozyrakis,et al. Improving System Energy Efficiency with Memory Rank Subsetting , 2012, TACO.
[33] Mahmut T. Kandemir,et al. Cache topology aware computation mapping for multicores , 2010, PLDI '10.
[34] Xin-She Yang,et al. Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.
[35] Mahmut T. Kandemir,et al. Addressing End-to-End Memory Access Latency in NoC-Based Multicores , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[36] Kevin Kai-Wei Chang,et al. Staged memory scheduling: Achieving high performance and scalability in heterogeneous systems , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).
[37] Sarita V. Adve,et al. Code transformations to improve memory parallelism , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.
[38] Niladrish Chatterjee,et al. Micro-pages: increasing DRAM efficiency with locality-aware data placement , 2010, ASPLOS 2010.
[39] Uday Bondhugula,et al. Data Layout Transformation for Enhancing Data Locality on NUCA Chip Multiprocessors , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.
[40] Rudolf Eigenmann,et al. SPEComp: A New Benchmark Suite for Measuring Parallel Computer Performance , 2001, WOMPAT.
[41] Norman P. Jouppi,et al. Rethinking DRAM design and organization for energy-constrained multi-cores , 2010, ISCA.
[42] Zhao Zhang,et al. A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality , 2000, MICRO 33.
[43] Onur Mutlu,et al. A case for exploiting subarray-level parallelism (SALP) in DRAM , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).
[44] Onur Mutlu,et al. DRAM-Aware Last-Level Cache Writeback: Reducing Write-Caused Interference in Memory Systems , 2010 .
[45] Sharad Malik,et al. Cache miss equations: an analytical representation of cache misses , 1997, ICS '97.
[46] Yongbing Huang,et al. A Study of Leveraging Memory Level Parallelism for DRAM System on Multi-core/Many-Core Architecture , 2013, 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications.