Understanding Off-Chip Memory Contention of Parallel Programs in Multicore Systems
暂无分享,去创建一个
[1] Michael D. Smith,et al. Improving Performance Isolation on Chip Multiprocessors via an Operating System Scheduler , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).
[2] James E. Smith,et al. Fair Queuing Memory Systems , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[3] Gerhard Wellein,et al. LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments , 2010, 2010 39th International Conference on Parallel Processing Workshops.
[4] Tejas Karkhanis,et al. A Day in the Life of a Data Cache Miss , 2002 .
[5] Mahmut T. Kandemir,et al. Organizing the last line of defense before hitting the memory wall for CMPs , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).
[6] Alexandra Fedorova,et al. Addressing shared resource contention in multicore processors via scheduling , 2010, ASPLOS XV.
[7] Yong Meng Teo,et al. A Practical Approach for Performance Analysis of Shared-Memory Programs , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[8] Brian Rogers,et al. Scaling the bandwidth wall: challenges in and avenues for CMP scaling , 2009, ISCA '09.
[9] Raj Jain,et al. The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling , 1991, Wiley professional computing.
[10] Sangyeun Cho,et al. Managing Distributed, Shared L2 Caches through OS-Level Page Allocation , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[11] Michael Lang,et al. Analyzing the trade-off between multiple memory controllers and memory channels on multi-core processor performance , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).
[12] Samuel Williams,et al. The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .
[13] James E. Smith,et al. Virtual private caches , 2007, ISCA '07.
[14] Ramesh Illikkal,et al. Rate-based QoS techniques for cache/memory in CMP platforms , 2009, ICS.
[15] Walter Willinger,et al. On the self-similar nature of Ethernet traffic , 1993, SIGCOMM '93.
[16] Steven A. Hofmeyr,et al. Oversubscription on multicore processors , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[17] Onur Mutlu,et al. Self-Optimizing Memory Controllers: A Reinforcement Learning Approach , 2008, 2008 International Symposium on Computer Architecture.
[18] Onur Mutlu,et al. Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems , 2008, 2008 International Symposium on Computer Architecture.
[19] Fang Liu,et al. Understanding how off-chip memory bandwidth partitioning in Chip Multiprocessors affects system performance , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.
[20] Kai Li,et al. The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[21] Pierre G. Paulin,et al. Multicore design is the challenge! What is the solution? , 2008, 2008 45th ACM/IEEE Design Automation Conference.
[22] Yale N. Patt,et al. Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[23] Mor Harchol-Balter,et al. ATLAS : A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers , 2010 .
[24] Rupak Biswas,et al. Performance impact of resource contention in multicore systems , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[25] David H. Bailey,et al. The NAS parallel benchmarks summary and preliminary results , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[26] Walter Willinger,et al. Self-Similar Network Traffic and Performance Evaluation , 2000 .
[27] Gabriel H. Loh,et al. Dynamic Classification of Program Memory Behaviors in CMPs , 2008 .
[28] Tulika Mitra,et al. Exploring locking & partitioning for predictable shared caches on multi-cores , 2008, 2008 45th ACM/IEEE Design Automation Conference.