Judicious Thread Migration When Accessing Distributed Shared Caches
暂无分享,去创建一个
[1] Rajeev Balasubramonian,et al. Dynamic hardware-assisted software-controlled page placement to manage capacity allocation and sharing within large caches , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.
[2] Aamer Jaleel,et al. Analyzing Parallel Programs with PIN , 2010, Computer.
[3] Jaehyuk Huh,et al. Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture , 2003, ISCA '03.
[4] Richard J. Lipton,et al. A Massive Memory Machine , 1984, IEEE Transactions on Computers.
[5] Stefan Rusu,et al. A 45nm 8-core enterprise Xeon ® processor , 2009 .
[6] Shantanu Gupta,et al. Architectural core salvaging in a multi-core processor for hard-error tolerance , 2009, ISCA '09.
[7] Mainak Chaudhuri. PageNUCA: Selected policies for page-grain locality management in large shared chip-multiprocessor caches , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.
[8] Henry Hoffmann,et al. On-Chip Interconnection Architecture of the Tile Processor , 2007, IEEE Micro.
[9] Robert Tappan Morris,et al. Reinventing Scheduling for Multicore Systems , 2009, HotOS.
[10] S. Borkar,et al. An 80-Tile Sub-100-W TeraFLOPS Processor in 65-nm CMOS , 2008, IEEE Journal of Solid-State Circuits.
[11] Wilson C. Hsieh,et al. Computation migration: enhancing locality for distributed-memory parallel systems , 1993, PPOPP '93.
[12] Zeshan Chishti,et al. Distance Associativity for High-Performance Energy-Efficient Non-Uniform Cache Architectures , 2003, MICRO.
[13] Srinivas Devadas,et al. Brief announcement: distributed shared memory based on computation migration , 2011, SPAA '11.
[14] Pierre Michaud. Exploiting the cache capacity of a single-chip multi-core processor with execution migration , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).
[15] Sangyeun Cho,et al. Managing Distributed, Shared L2 Caches through OS-Level Page Allocation , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[16] Jaehyuk Huh,et al. A NUCA Substrate for Flexible CMP Cache Sharing , 2007, IEEE Transactions on Parallel and Distributed Systems.
[17] Koushik Chakraborty,et al. Computation spreading: employing hardware migration to specialize CMP cores on-the-fly , 2006, ASPLOS XII.
[18] Babak Falsafi,et al. Reactive NUCA: near-optimal block placement and replication in distributed caches , 2009, ISCA '09.
[19] Srinivas Devadas,et al. DIRECTORYLESS SHARED MEMORY COHERENCE USING EXECUTION MIGRATION , 2011 .
[20] David A. Wood,et al. Managing Wire Delay in Large Chip-Multiprocessor Caches , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).
[21] David Wentzlaff,et al. Processor: A 64-Core SoC with Mesh Interconnect , 2010 .
[22] Srinivas Devadas,et al. Deadlock-free fine-grained thread migration , 2011, Proceedings of the Fifth ACM/IEEE International Symposium.
[23] George Kurian,et al. Graphite: A distributed parallel simulator for multicores , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.
[24] Anoop Gupta,et al. The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.
[25] Omer Khan,et al. System-level Optimizations for Memory Access in the Execution Migration Machine ( EM 2 ) , 2011 .
[26] Krste Asanovic,et al. Victim replication: maximizing capacity while hiding wire delay in tiled chip multiprocessors , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[27] Anoop Gupta,et al. Operating system support for improving data locality on CC-NUMA compute servers , 1996, ASPLOS VII.
[28] Vivek Sarkar,et al. Baring It All to Software: Raw Machines , 1997, Computer.
[29] Coniferous softwood. GENERAL TERMS , 2003 .
[30] Gu-Yeon Wei,et al. Thread motion: fine-grained power management for multi-core systems , 2009, ISCA '09.
[31] David W. Nellans,et al. Micro-pages: increasing DRAM efficiency with locality-aware data placement , 2010, ASPLOS XV.
[32] Doug Burger,et al. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches , 2002, ASPLOS X.