Architecting On-Chip DRAM Cache for Simultaneous Miss Rate and Latency Reduction
暂无分享,去创建一个
[1] N. Gura,et al. UltraSPARC T2: A highly-treaded, power-efficient, SPARC SOC , 2007, 2007 IEEE Asian Solid-State Circuits Conference.
[2] Gabriel H. Loh,et al. Extending the effectiveness of 3D-stacked DRAM caches with an adaptive multi-queue policy , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[3] Gabriel H. Loh,et al. A Mostly-Clean DRAM Cache for Effective Hit Speculation and Self-Balancing Dispatch , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[4] Li Zhao,et al. Exploring DRAM cache architectures for CMP server platforms , 2007, 2007 25th International Conference on Computer Design.
[5] Babak Falsafi,et al. Die-stacked DRAM caches for servers: hit ratio, latency, or bandwidth? have it all with footprint cache , 2013, ISCA.
[6] Gabriel H. Loh,et al. Zesto: A cycle-level simulator for highly detailed microarchitecture exploration , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[7] Koen De Bosschere,et al. 2FAR: A 2bcgskew Predictor Fused by an Alloyed Redundant History Skewed Perceptron Branch Predictor , 2005, J. Instr. Level Parallelism.
[8] Cheng-Chieh Huang,et al. ATCache: Reducing DRAM cache latency via a small SRAM tag cache , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).
[9] Yangdong Deng,et al. Interconnect characteristics of 2.5-D system integration scheme , 2001, ISPD '01.
[10] Manoj Franklin,et al. Balancing thoughput and fairness in SMT processors , 2001, 2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS..
[11] Greg Hamerly,et al. SimPoint 3.0: Faster and More Flexible Program Analysis , 2005 .
[12] Brad Calder,et al. Using SimPoint for accurate and efficient simulation , 2003, SIGMETRICS '03.
[13] Balaram Sinharoy,et al. The implementation of POWER7TM: A highly parallel and scalable multi-core high-end server processor , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).
[14] Mark D. Hill,et al. Efficiently enabling conventional block sizes for very large die-stacked DRAM caches , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[15] GoveDarryl. CPU2006 working set size , 2007 .
[16] Steven Paul Hartman,et al. IBM POWER7 systems , 2011 .
[17] Jörg Henkel,et al. Reducing inter-core cache contention with an adaptive bank mapping policy in DRAM cache , 2013, 2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).
[18] Jörg Henkel,et al. Adaptive cache management for a combined SRAM and DRAM cache hierarchy for multi-cores , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[19] Babak Falsafi,et al. Unison Cache: A Scalable and Effective Die-Stacked DRAM Cache , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[20] Lei Jiang,et al. Die Stacking (3D) Microarchitecture , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[21] Mark D. Hill,et al. Supporting Very Large DRAM Caches with Compound-Access Scheduling and MissMap , 2012, IEEE Micro.
[22] Jörg Henkel,et al. Simultaneously optimizing DRAM cache hit latency and miss rate via novel set mapping policies , 2013, 2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES).
[23] Gabriel H. Loh,et al. Resilient die-stacked DRAM caches , 2013, ISCA.
[24] Brad Calder,et al. SimPoint 3.0: Faster and More Flexible Program Phase Analysis , 2005, J. Instr. Level Parallelism.
[25] William J. Dally,et al. Memory access scheduling , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[26] Darryl Gove,et al. CPU2006 working set size , 2007, CARN.
[27] Yan Solihin,et al. CHOP: Integrating DRAM Caches for CMP Server Platforms , 2011, IEEE Micro.
[28] Jörg Henkel,et al. Reducing latency in an SRAM/DRAM cache hierarchy via a novel Tag-Cache architecture , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).
[29] Young-Hyun Jun,et al. 8 Gb 3-D DDR3 DRAM Using Through-Silicon-Via Technology , 2009, IEEE Journal of Solid-State Circuits.
[30] Stijn Eyerman,et al. System-Level Performance Metrics for Multiprogram Workloads , 2008, IEEE Micro.
[31] S. Kim,et al. Fair cache sharing and partitioning in a chip multiprocessor architecture , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..
[32] Gabriel H. Loh,et al. 3D-Stacked Memory Architectures for Multi-core Processors , 2008, 2008 International Symposium on Computer Architecture.
[33] Ieee Circuits,et al. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems information for authors , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.