DIRECTORYLESS SHARED MEMORY COHERENCE USING EXECUTION MIGRATION
暂无分享,去创建一个
[1] Anant Agarwal,et al. Energy Scalability of On-Chip Interconnection Networks in Multicore Architectures , 2008 .
[2] Robert Tappan Morris,et al. Reinventing Scheduling for Multicore Systems , 2009, HotOS.
[3] Mahmut T. Kandemir,et al. A novel migration-based NUCA design for Chip Multiprocessors , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[4] Srinivas Devadas,et al. Deadlock-free fine-grained thread migration , 2011, Proceedings of the Fifth ACM/IEEE International Symposium.
[5] Jung Ho Ahn,et al. A Comprehensive Memory Modeling Tool and Its Application to the Design and Analysis of Future Memory Hierarchies , 2008, 2008 International Symposium on Computer Architecture.
[6] George Kurian,et al. Graphite: A distributed parallel simulator for multicores , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.
[7] Aamer Jaleel,et al. Analyzing Parallel Programs with PIN , 2010, Computer.
[8] Doug Burger,et al. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches , 2002, ASPLOS X.
[9] Mainak Chaudhuri. PageNUCA: Selected policies for page-grain locality management in large shared chip-multiprocessor caches , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.
[10] Rajeev Balasubramonian,et al. Dynamic hardware-assisted software-controlled page placement to manage capacity allocation and sharing within large caches , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.
[11] William J. Dally,et al. Principles and Practices of Interconnection Networks , 2004 .
[12] Krste Asanovic,et al. Victim replication: maximizing capacity while hiding wire delay in tiled chip multiprocessors , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[13] Anoop Gupta,et al. Operating system support for improving data locality on CC-NUMA compute servers , 1996, ASPLOS VII.
[14] Niraj K. Jha,et al. A 4.6Tbits/s 3.6GHz single-cycle NoC router with a novel switch allocator in 65nm CMOS , 2007, ICCD.
[15] Wilson C. Hsieh,et al. Computation migration: enhancing locality for distributed-memory parallel systems , 1993, PPOPP '93.
[16] David E. Culler,et al. Monsoon: an explicit token-store architecture , 1998, ISCA '98.
[17] Marcelo Cintra,et al. An OS-based alternative to full hardware coherence on tiled CMPs , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.
[18] Vijayalakshmi Srinivasan,et al. A Tagless Coherence Directory , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[19] Omer Khan,et al. System-level Optimizations for Memory Access in the Execution Migration Machine ( EM 2 ) , 2011 .
[20] Michael D. Noakes,et al. The J-machine multicomputer: an architectural evaluation , 1993, ISCA '93.
[21] Pierre Michaud. Exploiting the cache capacity of a single-chip multi-core processor with execution migration , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).
[22] Sangyeun Cho,et al. Managing Distributed, Shared L2 Caches through OS-Level Page Allocation , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[23] Anoop Gupta,et al. Reducing Memory and Traffic Requirements for Scalable Directory-Based Cache Coherence Schemes , 1990, ICPP.
[24] Dean M. Tullsen,et al. Proximity-aware directory-based coherence for multi-core processor architectures , 2007, SPAA '07.
[25] John L. Hennessy,et al. The performance advantages of integrating block data transfer in cache-coherent multiprocessors , 1994, ASPLOS VI.
[26] Jean-Luc Gaudiot,et al. Nomadic Threads: a migrating multithreaded approach to remote memory accesses in multiprocessors , 1996, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique.
[27] Ricardo Bianchini,et al. Using simple page placement policies to reduce the cost of cache fills in coherent shared-memory systems , 1995, Proceedings of 9th International Parallel Processing Symposium.
[28] Jean-Luc Gaudiot,et al. An evaluation of thread migration for exploiting distributed array locality , 2002, Proceedings 16th Annual International Symposium on High Performance Computing Systems and Applications.
[29] Babak Falsafi,et al. Reactive NUCA: near-optimal block placement and replication in distributed caches , 2009, ISCA '09.
[30] D. Banks,et al. Assembly and Packaging , 2006 .
[31] George Kurian,et al. ATAC: A 1000-core cache-coherent processor with on-chip optical network , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[32] David W. Nellans,et al. Micro-pages: increasing DRAM efficiency with locality-aware data placement , 2010, ASPLOS XV.
[33] Coniferous softwood. GENERAL TERMS , 2003 .
[34] Sandhya Dwarkadas,et al. SPACE: Sharing pattern-based directory coherence for multicore scalability , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[35] Gu-Yeon Wei,et al. Thread motion: fine-grained power management for multi-core systems , 2009, ISCA '09.
[36] Richard J. Lipton,et al. A Massive Memory Machine , 1984, IEEE Transactions on Computers.
[37] Stefan Rusu,et al. A 45nm 8-core enterprise Xeon ® processor , 2009 .
[38] Koushik Chakraborty,et al. Computation spreading: employing hardware migration to specialize CMP cores on-the-fly , 2006, ASPLOS XII.
[39] Anoop Gupta,et al. The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.
[40] A. Kumary,et al. A 4.6Tbits/s 3.6GHz single-cycle NoC router with a novel switch allocator in 65nm CMOS , 2007 .