TicToc: Enabling Bandwidth-Efficient DRAM Caching for Both Hits and Misses in Hybrid Memory Systems
暂无分享,去创建一个
[1] Aamer Jaleel,et al. CAMEO: A Two-Level Memory Organization with Capacity of Main Memory and Flexibility of Hardware-Managed Cache , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[2] Gabriel H. Loh,et al. Resilient die-stacked DRAM caches , 2013, ISCA.
[3] Carole-Jean Wu,et al. SHiP: Signature-based Hit Predictor for high performance caching , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[4] Jinkyu Jeong,et al. Efficient footprint caching for Tagless DRAM Caches , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[5] David A. Patterson,et al. The GAP Benchmark Suite , 2015, ArXiv.
[6] Seth H. Pugsley,et al. USIMM : the Utah SImulated Memory Module , 2012 .
[7] Josep Torrellas,et al. PageSeer: Using Page Walks to Trigger Page Swaps in Hybrid Memory Systems , 2019, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[8] Yan Solihin,et al. Counter-Based Cache Replacement and Bypassing Algorithms , 2008, IEEE Transactions on Computers.
[9] Alaa R. Alameldeen,et al. Transparent Hardware Management of Stacked DRAM as Part of Memory , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[10] Babak Falsafi,et al. Unison Cache: A Scalable and Effective Die-Stacked DRAM Cache , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[11] Parijat Dube,et al. Architectural design for next generation heterogeneous memory systems , 2010, 2010 IEEE International Memory Workshop.
[12] Srinivas Devadas,et al. Banshee: Bandwidth-Efficient DRAM Caching via Software/Hardware Cooperation , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[13] Xiao Liu,et al. Basic Performance Measurements of the Intel Optane DC Persistent Memory Module , 2019, ArXiv.
[14] John L. Henning. SPEC CPU2006 benchmark descriptions , 2006, CARN.
[15] Aamer Jaleel,et al. ACCORD: Enabling Associativity for Gigascale DRAM Caches by Coordinating Way-Install and Way-Prediction , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[16] Reena Panda,et al. SILC-FM: Subblocked InterLeaved Cache-Like Flat Memory Organization , 2016, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[17] Cheng-Chieh Huang,et al. ATCache: Reducing DRAM cache latency via a small SRAM tag cache , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).
[18] Aamer Jaleel,et al. BEAR: Techniques for mitigating bandwidth bloat in gigascale DRAM caches , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[19] Gabriel H. Loh,et al. Fundamental Latency Trade-off in Architecting DRAM Caches: Outperforming Impractical SRAM-Tags with a Simple and Practical Design , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[20] Mike O'Connor,et al. Cache coherence for GPU architectures , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).
[21] Gabriel H. Loh,et al. A Mostly-Clean DRAM Cache for Effective Hit Speculation and Self-Balancing Dispatch , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[22] Babak Falsafi,et al. Die-stacked DRAM caches for servers: hit ratio, latency, or bandwidth? have it all with footprint cache , 2013, ISCA.
[23] Gabriel H. Loh,et al. Challenges in Heterogeneous Die-Stacked and Off-Chip Memory Systems , 2012 .
[24] Tajana Simunic,et al. PDRAM: A hybrid PRAM and DRAM main memory system , 2009, 2009 46th ACM/IEEE Design Automation Conference.
[25] Akhilesh Kumar,et al. Cascade Lake: Next Generation Intel Xeon Scalable Processor , 2019, IEEE Micro.
[26] C. Wilkerson,et al. A Dueling Segmented LRU Replacement Algorithm with Adaptive Bypassing , 2010 .
[27] Cheng-Chieh Huang,et al. C3D: Mitigating the NUMA bottleneck via coherent DRAM caches , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[28] Mark D. Hill,et al. Efficiently enabling conventional block sizes for very large die-stacked DRAM caches , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[29] Moinuddin K. Qureshi,et al. DICE: Compressing DRAM caches for bandwidth and capacity , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[30] Aamer Jaleel,et al. SHiP + + : Enhancing Signature-Based Hit Predictor for Improved Cache Performance , 2017 .
[31] Rajiv Kapoor,et al. Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).
[32] Vijayalakshmi Srinivasan,et al. Scalable high performance main memory system using phase-change memory technology , 2009, ISCA '09.
[33] Onur Mutlu,et al. Enabling Efficient and Scalable Hybrid Memories Using Fine-Granularity DRAM Cache Management , 2012, IEEE Computer Architecture Letters.
[34] Jinkyu Jeong,et al. A fully associative, tagless DRAM cache , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[35] Aamer Jaleel,et al. CANDY: Enabling coherent DRAM caches for multi-node systems , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[36] Yen-Chen Liu,et al. Knights Landing: Second-Generation Intel Xeon Phi Product , 2016, IEEE Micro.
[37] Jeffrey B. Rothman,et al. Sector cache design and performance , 2000, Proceedings 8th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (Cat. No.PR00728).
[38] Aamer Jaleel,et al. Combining HW/SW Mechanisms to Improve NUMA Performance of Multi-GPU Systems , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[39] Dean M. Tullsen,et al. MemPod: A Clustered Architecture for Efficient and Scalable Migration in Flat Address Space Multi-level Memories , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[40] Tao Zhang,et al. Building a Low Latency, Highly Associative DRAM Cache with the Buffered Way Predictor , 2016, 2016 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD).