Scalable directoryless shared memory coherence using execution migration
暂无分享,去创建一个
[1] David A. Patterson,et al. Computer Architecture - A Quantitative Approach (4. ed.) , 2007 .
[2] Mainak Chaudhuri. PageNUCA: Selected policies for page-grain locality management in large shared chip-multiprocessor caches , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.
[3] Rajeev Balasubramonian,et al. Dynamic hardware-assisted software-controlled page placement to manage capacity allocation and sharing within large caches , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.
[4] Babak Falsafi,et al. Reactive NUCA: near-optimal block placement and replication in distributed caches , 2009, ISCA '09.
[5] Anant Agarwal,et al. Energy Scalability of On-Chip Interconnection Networks in Multicore Architectures , 2008 .
[6] Mahmut T. Kandemir,et al. A novel migration-based NUCA design for Chip Multiprocessors , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[7] William J. Dally,et al. Principles and Practices of Interconnection Networks , 2004 .
[8] Koushik Chakraborty,et al. Computation spreading: employing hardware migration to specialize CMP cores on-the-fly , 2006, ASPLOS XII.
[9] Robert Tappan Morris,et al. Reinventing Scheduling for Multicore Systems , 2009, HotOS.
[10] Gu-Yeon Wei,et al. Thread motion: fine-grained power management for multi-core systems , 2009, ISCA '09.
[11] D. Banks,et al. Assembly and Packaging , 2006 .
[12] David W. Nellans,et al. Micro-pages: increasing DRAM efficiency with locality-aware data placement , 2010, ASPLOS XV.
[13] Aamer Jaleel,et al. Analyzing Parallel Programs with PIN , 2010, Computer.
[14] Richard J. Lipton,et al. A Massive Memory Machine , 1984, IEEE Transactions on Computers.
[15] Stefan Rusu,et al. A 45nm 8-core enterprise Xeon ® processor , 2009 .
[16] Wilson C. Hsieh,et al. Computation migration: enhancing locality for distributed-memory parallel systems , 1993, PPOPP '93.
[17] David E. Culler,et al. Monsoon: an explicit token-store architecture , 1998, ISCA '98.
[18] Anoop Gupta,et al. Reducing Memory and Traffic Requirements for Scalable Directory-Based Cache Coherence Schemes , 1990, ICPP.
[19] John L. Hennessy,et al. The performance advantages of integrating block data transfer in cache-coherent multiprocessors , 1994, ASPLOS VI.
[20] Ricardo Bianchini,et al. Using simple page placement policies to reduce the cost of cache fills in coherent shared-memory systems , 1995, Proceedings of 9th International Parallel Processing Symposium.
[21] Pierre Michaud. Exploiting the cache capacity of a single-chip multi-core processor with execution migration , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).
[22] Sangyeun Cho,et al. Managing Distributed, Shared L2 Caches through OS-Level Page Allocation , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[23] Jaehyuk Huh,et al. A NUCA Substrate for Flexible CMP Cache Sharing , 2007, IEEE Transactions on Parallel and Distributed Systems.
[24] Coniferous softwood. GENERAL TERMS , 2003 .
[25] Mario Nemirovsky,et al. A Massively Multithreaded Packet Processor , 2004 .
[26] Doug Burger,et al. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches , 2002, ASPLOS X.
[27] Jung Ho Ahn,et al. A Comprehensive Memory Modeling Tool and Its Application to the Design and Analysis of Future Memory Hierarchies , 2008, 2008 International Symposium on Computer Architecture.
[28] George Kurian,et al. Graphite: A distributed parallel simulator for multicores , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.
[29] Krste Asanovic,et al. Victim replication: maximizing capacity while hiding wire delay in tiled chip multiprocessors , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[30] Anoop Gupta,et al. Operating system support for improving data locality on CC-NUMA compute servers , 1996, ASPLOS VII.
[31] Niraj K. Jha,et al. A 4.6Tbits/s 3.6GHz single-cycle NoC router with a novel switch allocator in 65nm CMOS , 2007, ICCD.
[32] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[33] Michael D. Noakes,et al. The J-machine multicomputer: an architectural evaluation , 1993, ISCA '93.
[34] David A. Wood,et al. Managing Wire Delay in Large Chip-Multiprocessor Caches , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).
[35] Anoop Gupta,et al. The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.
[36] A. Kumary,et al. A 4.6Tbits/s 3.6GHz single-cycle NoC router with a novel switch allocator in 65nm CMOS , 2007 .