Memory coherence in the age of multicores
暂无分享,去创建一个
[1] Anoop Gupta,et al. Reducing Memory and Traffic Requirements for Scalable Directory-Based Cache Coherence Schemes , 1990, ICPP.
[2] Jaehyuk Huh,et al. Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture , 2003, ISCA '03.
[3] Mainak Chaudhuri. PageNUCA: Selected policies for page-grain locality management in large shared chip-multiprocessor caches , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.
[4] Anoop Gupta,et al. The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.
[5] D. Banks,et al. Assembly and Packaging , 2006 .
[6] Henry Hoffmann,et al. On-Chip Interconnection Architecture of the Tile Processor , 2007, IEEE Micro.
[7] Doug Burger,et al. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches , 2002, ASPLOS X.
[8] Babak Falsafi,et al. Selective, accurate, and timely self-invalidation using last-touch prediction , 2000, ISCA '00.
[9] Srinivas Devadas,et al. Brief announcement: distributed shared memory based on computation migration , 2011, SPAA '11.
[10] David J. Lilja,et al. So many states, so little time: verifying memory coherence in the Cray X1 , 2003, Proceedings International Parallel and Distributed Processing Symposium.
[11] Sang Lyul Min,et al. Design and Analysis of a Scalable Cache Coherence Scheme Based on Clocks and Timestamps , 1992, IEEE Trans. Parallel Distributed Syst..
[12] Srinivas Devadas,et al. Scalable, accurate multicore simulation in the 1000-core era , 2011, (IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE.
[13] Rami G. Melhem,et al. A timestamp-based selective invalidation scheme for multiprocessor cache coherence , 1996, Proceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing.
[14] Angela C. Sodan. Message-passing and shared-data programming models - wish vs. reality , 2005, 19th International Symposium on High Performance Computing Systems and Applications (HPCS'05).
[15] S. Borkar,et al. An 80-Tile Sub-100-W TeraFLOPS Processor in 65-nm CMOS , 2008, IEEE Journal of Solid-State Circuits.
[16] Srinivas Devadas,et al. Deadlock-free fine-grained thread migration , 2011, Proceedings of the Fifth ACM/IEEE International Symposium.
[17] Mark Horowitz,et al. An evaluation of directory schemes for cache coherence , 1998, ISCA '98.
[18] Babak Falsafi,et al. Reactive NUCA: near-optimal block placement and replication in distributed caches , 2009, ISCA '09.
[19] Niladrish Chatterjee,et al. Micro-pages: increasing DRAM efficiency with locality-aware data placement , 2010, ASPLOS 2010.
[20] David Wentzlaff,et al. Processor: A 64-Core SoC with Mesh Interconnect , 2010 .
[21] Stefan Rusu,et al. A 45nm 8-core enterprise Xeon ® processor , 2009 .
[22] Vivek Sarkar,et al. Baring It All to Software: Raw Machines , 1997, Computer.
[23] Coniferous softwood. GENERAL TERMS , 2003 .
[24] David A. Wood,et al. Dynamic self-invalidation: reducing coherence overhead in shared-memory multiprocessors , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[25] Pierre Michaud. Exploiting the cache capacity of a single-chip multi-core processor with execution migration , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).
[26] Sangyeun Cho,et al. Managing Distributed, Shared L2 Caches through OS-Level Page Allocation , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[27] Omer Khan,et al. System-level Optimizations for Memory Access in the Execution Migration Machine ( EM 2 ) , 2011 .
[28] Michael D. Noakes,et al. The J-machine multicomputer: an architectural evaluation , 1993, ISCA '93.
[29] Krste Asanovic,et al. Victim replication: maximizing capacity while hiding wire delay in tiled chip multiprocessors , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[30] Anoop Gupta,et al. Operating system support for improving data locality on CC-NUMA compute servers , 1996, ASPLOS VII.
[31] Rajeev Balasubramonian,et al. Dynamic hardware-assisted software-controlled page placement to manage capacity allocation and sharing within large caches , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.
[32] Marcelo Cintra,et al. An OS-based alternative to full hardware coherence on tiled CMPs , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.
[33] David W. Nellans,et al. Micro-pages: increasing DRAM efficiency with locality-aware data placement , 2010, ASPLOS XV.
[34] Omer Khan,et al. EM2: A Scalable Shared-Memory Multicore Architecture , 2010 .
[35] Mahmut T. Kandemir,et al. A novel migration-based NUCA design for Chip Multiprocessors , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.