The Execution Migration Machine
暂无分享,去创建一个
[1] Richard J. Lipton,et al. A Massive Memory Machine , 1984, IEEE Transactions on Computers.
[2] Doug Burger,et al. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches , 2002, ASPLOS X.
[3] Srinivas Devadas,et al. DIRECTORYLESS SHARED MEMORY COHERENCE USING EXECUTION MIGRATION , 2011 .
[4] Srinivas Devadas,et al. Deadlock-free fine-grained thread migration , 2011, Proceedings of the Fifth ACM/IEEE International Symposium.
[5] S. Borkar,et al. An 80-Tile Sub-100-W TeraFLOPS Processor in 65-nm CMOS , 2008, IEEE Journal of Solid-State Circuits.
[6] Wilson C. Hsieh,et al. Computation migration: enhancing locality for distributed-memory parallel systems , 1993, PPOPP '93.
[7] T. N. Vijaykumar,et al. Distance associativity for high-performance energy-efficient non-uniform cache architectures , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..
[8] David Wentzlaff,et al. Energy characterization of a tiled architecture processor with on-chip networks , 2003, ISLPED '03.
[9] Rajeev Balasubramonian,et al. Dynamic hardware-assisted software-controlled page placement to manage capacity allocation and sharing within large caches , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.
[10] Omer Khan,et al. System-level Optimizations for Memory Access in the Execution Migration Machine ( EM 2 ) , 2011 .
[11] Dean M. Tullsen,et al. The shared-thread multiprocessor , 2008, ICS '08.
[12] Onur Mutlu,et al. Bottleneck identification and scheduling in multithreaded applications , 2012, ASPLOS XVII.
[13] David W. Nellans,et al. Micro-pages: increasing DRAM efficiency with locality-aware data placement , 2010, ASPLOS XV.
[14] Robert Tappan Morris,et al. Reinventing Scheduling for Multicore Systems , 2009, HotOS.
[15] Shantanu Gupta,et al. Architectural core salvaging in a multi-core processor for hard-error tolerance , 2009, ISCA '09.
[16] Koushik Chakraborty,et al. Computation spreading: employing hardware migration to specialize CMP cores on-the-fly , 2006, ASPLOS XII.
[17] Mainak Chaudhuri. PageNUCA: Selected policies for page-grain locality management in large shared chip-multiprocessor caches , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.
[18] Krste Asanovic,et al. Victim replication: maximizing capacity while hiding wire delay in tiled chip multiprocessors , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[19] Anoop Gupta,et al. Operating system support for improving data locality on CC-NUMA compute servers , 1996, ASPLOS VII.
[20] Niraj K. Jha,et al. A 4.6Tbits/s 3.6GHz single-cycle NoC router with a novel switch allocator in 65nm CMOS , 2007, ICCD.
[21] Babak Falsafi,et al. Reactive NUCA: near-optimal block placement and replication in distributed caches , 2009, ISCA '09.
[22] Pierre Michaud. Exploiting the cache capacity of a single-chip multi-core processor with execution migration , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).
[23] Sangyeun Cho,et al. Managing Distributed, Shared L2 Caches through OS-Level Page Allocation , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[24] Jaehyuk Huh,et al. A NUCA Substrate for Flexible CMP Cache Sharing , 2007, IEEE Transactions on Parallel and Distributed Systems.
[25] Srinivas Devadas,et al. Thread Migration Prediction for Distributed Shared Caches , 2014, IEEE Computer Architecture Letters.
[26] Gu-Yeon Wei,et al. Thread motion: fine-grained power management for multi-core systems , 2009, ISCA '09.
[27] David A. Wood,et al. Managing Wire Delay in Large Chip-Multiprocessor Caches , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).