C3D: Mitigating the NUMA bottleneck via coherent DRAM caches
暂无分享,去创建一个
Cheng-Chieh Huang | Boris Grot | Vijay Nagarajan | Rakesh Kumar | Marco Elver | Rakesh Kumar | Boris Grot | V. Nagarajan | Cheng-Chieh Huang | M. Elver
[1] Babak Falsafi,et al. JETTY: filtering snoops for reduced energy consumption in SMP servers , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.
[2] Andreas Moshovos. RegionScout: exploiting coarse grain sharing in snoop-based coherence , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[3] Vivien Quéma,et al. Traffic management: a holistic approach to memory placement on NUMA systems , 2013, ASPLOS '13.
[4] John L. Hennessy,et al. An evaluation of a commercial CC-NUMA architecture-the CONVEX Exemplar SPP1200 , 1997, Proceedings 11th International Parallel Processing Symposium.
[5] Paul Feautrier,et al. A New Solution to Coherence Problems in Multicache Systems , 1978, IEEE Transactions on Computers.
[6] Cheng-Chieh Huang,et al. ATCache: Reducing DRAM cache latency via a small SRAM tag cache , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).
[7] Babak Falsafi,et al. Multi-grain coherence directories , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[8] Josep Torrellas,et al. Cache-Only Memory Architectures , 1999, Computer.
[9] Aamer Jaleel,et al. BEAR: Techniques for mitigating bandwidth bloat in gigascale DRAM caches , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[10] Harish Patil,et al. Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.
[11] Michael L. Scott,et al. Simple but effective techniques for NUMA memory management , 1989, SOSP '89.
[12] R. Manikantan,et al. Bi-Modal DRAM Cache: A Scalable and Effective Die-Stacked DRAM Cache , 2014, MICRO 2014.
[13] Fredrik Larsson,et al. Simics: A Full System Simulation Platform , 2002, Computer.
[14] Anoop Gupta,et al. Parallel computer architecture - a hardware / software approach , 1998 .
[15] John B. Carter,et al. An argument for simple COMA , 1995, Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture.
[16] Babak Falsafi,et al. Unison Cache: A Scalable and Effective Die-Stacked DRAM Cache , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[17] Josep Torrellas,et al. Reducing remote conflict misses: NUMA with remote cache versus COMA , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.
[18] Vijayalakshmi Srinivasan,et al. A Tagless Coherence Directory , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[19] Jinkyu Jeong,et al. A fully associative, tagless DRAM cache , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[20] Josep Torrellas,et al. Enhancing memory use in Simple Coma: Multiplexed Simple Coma , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.
[21] Wolfgang E. Nagel,et al. Comparing cache architectures and coherency protocols on x86-64 multicore SMP systems , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[22] Babak Falsafi,et al. Clearing the clouds: a study of emerging scale-out workloads on modern hardware , 2012, ASPLOS XVII.
[23] Gabriel H. Loh,et al. A Mostly-Clean DRAM Cache for Effective Hit Speculation and Self-Balancing Dispatch , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[24] R. Govindarajan,et al. Bi-Modal DRAM Cache: Improving Hit Rate, Hit Latency and Bandwidth , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[25] Babak Falsafi,et al. Die-stacked DRAM caches for servers: hit ratio, latency, or bandwidth? have it all with footprint cache , 2013, ISCA.
[26] Mark Horowitz,et al. An evaluation of directory schemes for cache coherence , 1998, ISCA '98.
[27] Kevin M. Lepak,et al. Cache Hierarchy and Memory Subsystem of the AMD Opteron Processor , 2010, IEEE Micro.
[28] David L. Dill,et al. The Murphi Verification System , 1996, CAV.
[29] Kai Li,et al. The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[30] Babak Falsafi,et al. Reactive NUCA: near-optimal block placement and replication in distributed caches , 2009, ISCA '09.
[31] Christian Bienia,et al. Benchmarking modern multiprocessors , 2011 .
[32] Yan Solihin,et al. CHOP: Adaptive filter-based DRAM caching for CMP server platforms , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.
[33] Gabriel H. Loh,et al. Fundamental Latency Trade-off in Architecting DRAM Caches: Outperforming Impractical SRAM-Tags with a Simple and Practical Design , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[34] Erik Hagersten,et al. DDM - A Cache-Only Memory Architecture , 1992, Computer.
[35] Anoop Gupta,et al. Reducing Memory and Traffic Requirements for Scalable Directory-Based Cache Coherence Schemes , 1990, ICPP.
[36] Matthias S. Müller,et al. Memory Performance and Cache Coherency Effects on an Intel Nehalem Multiprocessor System , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.
[37] Mikko H. Lipasti,et al. Coarse-Grain Coherence Tracking: RegionScout and Region Coherence Arrays , 2006, IEEE Micro.
[38] Andreas Moshovos. RegionScout: Exploiting Coarse Grain Sharing in Snoop-Based Coherence , 2005, ISCA 2005.
[39] David A. Wood,et al. A Primer on Memory Consistency and Cache Coherence , 2012, Synthesis Lectures on Computer Architecture.
[40] Mark D. Hill,et al. Efficiently enabling conventional block sizes for very large die-stacked DRAM caches , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).