Hardware-oriented cache management for large-scale chip multiprocessors
暂无分享,去创建一个
[1] Per Stenström,et al. An Adaptive Shared/Private NUCA Cache Partitioning Scheme for Chip Multiprocessors , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.
[2] Dirk Grunwald,et al. Predictive sequential associative cache , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.
[3] David A. Wood,et al. Managing Wire Delay in Large Chip-Multiprocessor Caches , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).
[4] Zeshan Chishti,et al. Distance Associativity for High-Performance Energy-Efficient Non-Uniform Cache Architectures , 2003, MICRO.
[5] Jih-Kwon Peir,et al. Capturing dynamic memory reference behavior with adaptive cache topology , 1998, ASPLOS VIII.
[6] Saurabh Dighe,et al. An 80-Tile 1.28TFLOPS Network-on-Chip in 65nm CMOS , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.
[7] Rami G. Melhem,et al. ACM: An Efficient Approach for Managing Shared Caches in Chip Multiprocessors , 2009, HiPEAC.
[8] Yale N. Patt,et al. Utility-Based Cache Partitioning , 2006 .
[9] Zeshan Chishti,et al. Optimizing replication, communication, and capacity allocation in CMPs , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[10] José F. Martínez,et al. Scavenger: A New Last Level Cache Architecture with Global Block Priority , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).
[11] Jean-Didier Legat,et al. Application-Specific Reconfigurable XOR-Indexing to Eliminate Cache Conflict Misses , 2006, Proceedings of the Design Automation & Test in Europe Conference.
[12] Timothy Johnson,et al. An 8-core, 64-thread, 64-bit power efficient sparc soc (niagara2) , 2007, ISPD '07.
[13] Mainak Chaudhuri. PageNUCA: Selected policies for page-grain locality management in large shared chip-multiprocessor caches , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.
[14] A. Argawal,et al. Cache performance of operating systems and multiprogramming , 1988 .
[15] Moinuddin K. Qureshi. Adaptive Spill-Receive for robust high-performance caching in CMPs , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.
[16] Luiz André Barroso,et al. Piranha: a scalable architecture based on single-chip multiprocessing , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[17] Balaram Sinharoy,et al. POWER5 system microarchitecture , 2005, IBM J. Res. Dev..
[18] Mahmut T. Kandemir,et al. Implementation and evaluation of a migration-based NUCA design for chip multiprocessors , 2008, SIGMETRICS '08.
[19] Sarah Leilani Harris,et al. Synergistic caching in single-chip multiprocessors , 2005 .
[20] Anoop Gupta,et al. The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.
[21] Steven K. Reinhardt,et al. A fully associative software-managed cache design , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[22] Rami G. Melhem,et al. C-AMTE: A location mechanism for flexible cache management in chip multiprocessors , 2011, J. Parallel Distributed Comput..
[23] Rami Melhem,et al. Cache Equalizer: A Cache Pressure Aware Block Placement Scheme for Large-Scale Chip Multiprocessors , 2009 .
[24] Onur Mutlu,et al. Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.
[25] Sangyeun Cho,et al. Managing Distributed, Shared L2 Caches through OS-Level Page Allocation , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[26] Yale N. Patt,et al. Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[27] Jaehyuk Huh,et al. A NUCA Substrate for Flexible CMP Cache Sharing , 2007, IEEE Transactions on Parallel and Distributed Systems.
[28] Alberto Ros,et al. Scalable Directory Organization for Tiled CMP Architectures , 2008, CDES.
[29] Kang G. Shin,et al. Improving energy efficiency by making DRAM less randomly accessed , 2005, ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005..
[30] Krste Asanovic,et al. Victim replication: maximizing capacity while hiding wire delay in tiled chip multiprocessors , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[31] Yale N. Patt,et al. The V-Way cache: demand-based associativity via global replacement , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[32] David A. Wood,et al. ASR: Adaptive Selective Replication for CMP Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[33] Rajeev Balasubramonian,et al. Dynamic hardware-assisted software-controlled page placement to manage capacity allocation and sharing within large caches , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.
[34] Simon W. Moore,et al. Low-latency virtual-channel routers for on-chip networks , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..
[35] Mahmut T. Kandemir,et al. A novel migration-based NUCA design for Chip Multiprocessors , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[36] José González,et al. The design and performance of a conflict-avoiding cache , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.
[37] Sangyeun Cho,et al. Taming Single-Thread Program Performance on Many Distributed On-Chip L2 Caches , 2008, 2008 37th International Conference on Parallel Processing.
[38] David K. Tam,et al. Managing Shared L2 Caches on Multicore Systems in Software , 2007 .
[39] Margaret Martonosi,et al. A dynamic compilation framework for controlling microprocessor energy and performance , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).
[40] Anoop Gupta,et al. Reducing Memory and Traffic Requirements for Scalable Directory-Based Cache Coherence Schemes , 1990, ICPP.
[41] Basilio B. Fraguela,et al. Adaptive line placement with the set balancing cache , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[42] Kai Li,et al. The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[43] Glenn Reinman,et al. Reducing energy and delay using efficient victim caches , 2003, ISLPED '03.
[44] Mark D. Hill,et al. Virtual hierarchies to support server consolidation , 2007, ISCA '07.
[45] Brad Calder,et al. Reducing cache misses using hardware and software page placement , 1999, ICS '99.
[46] Anant Agarwal,et al. Column-associative caches: a technique for reducing the miss rate of direct-mapped caches , 1993, ISCA '93.
[47] Kunle Olukotun,et al. Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.
[48] Dam Sunwoo,et al. FPGA-Accelerated Simulation Technologies (FAST): Fast, Full-System, Cycle-Accurate Simulators , 2007, MICRO.
[49] Gabriel H. Loh,et al. PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches , 2009, ISCA '09.
[50] Mahmut T. Kandemir,et al. Adaptive set pinning: managing shared caches in chip multiprocessors , 2008, ASPLOS.
[51] Uri C. Weiser,et al. Utilizing shared data in chip multiprocessors with the Nahalal architecture , 2008, SPAA '08.
[52] Paul Feautrier,et al. A New Solution to Coherence Problems in Multicache Systems , 1978, IEEE Transactions on Computers.
[53] Kunle Olukotun,et al. Chip Multiprocessor Architecture: Techniques to Improve Throughput and Latency , 2007 .
[54] Doug Burger,et al. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches , 2002, ASPLOS X.
[55] Jichuan Chang,et al. Cooperative Caching for Chip Multiprocessors , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).
[56] D. Lenoski,et al. The SGI Origin: A ccnuma Highly Scalable Server , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[57] Dean M. Tullsen,et al. Runtime identification of cache conflict misses: The adaptive miss buffer , 2001, TOCS.
[58] Rami G. Melhem,et al. Dynamic cache clustering for chip multiprocessors , 2009, ICS '09.
[59] Babak Falsafi,et al. Reactive NUCA: near-optimal block placement and replication in distributed caches , 2009, ISCA '09.
[60] Chuanjun Zhang. Balanced Cache: Reducing Conflict Misses of Direct-Mapped Caches , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).
[61] André Seznec,et al. A case for two-way skewed-associative caches , 1993, ISCA '93.
[62] D. Weiss,et al. The on-chip 3 MB subarray based 3rd level cache on an Itanium microprocessor , 2002, 2002 IEEE International Solid-State Circuits Conference. Digest of Technical Papers (Cat. No.02CH37315).
[63] Aamer Jaleel,et al. Adaptive insertion policies for managing shared caches , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[64] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[65] Mainak Chaudhuri,et al. Pseudo-LIFO: The foundation of a new family of replacement policies for last-level caches , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[66] Aamer Jaleel,et al. Adaptive insertion policies for high performance caching , 2007, ISCA '07.
[67] Michael Zhang,et al. Victim Migration: Dynamically Adapting Between Private and Shared CMP Caches , 2005 .