Die-stacked DRAM caches for servers: hit ratio, latency, or bandwidth? have it all with footprint cache
暂无分享,去创建一个
[1] Mark D. Hill,et al. Supporting Very Large DRAM Caches with Compound-Access Scheduling and MissMap , 2012, IEEE Micro.
[2] Gabriel H. Loh,et al. Extending the effectiveness of 3D-stacked DRAM caches with an adaptive multi-queue policy , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[3] Brian Rogers,et al. Scaling the bandwidth wall: challenges in and avenues for CMP scaling , 2009, ISCA '09.
[4] Hsien-Hsin S. Lee,et al. Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Conventional and 3D Die-Stacked DRAMs , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).
[5] Peter A. Franaszek,et al. Victim management in a cache hierarchy , 2006, IBM J. Res. Dev..
[6] Yan Solihin,et al. CHOP: Adaptive filter-based DRAM caching for CMP server platforms , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.
[7] Gabriel H. Loh,et al. Fundamental Latency Trade-off in Architecting DRAM Caches: Outperforming Impractical SRAM-Tags with a Simple and Practical Design , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[8] Mikko H. Lipasti,et al. Improving Multiprocessor Performance with Coarse-Grain Coherence Tracking , 2005, ISCA 2005.
[9] Onur Mutlu,et al. Enabling Efficient and Scalable Hybrid Memories Using Fine-Granularity DRAM Cache Management , 2012, IEEE Computer Architecture Letters.
[10] SeznecA.. Decoupled sectored caches , 1994 .
[11] Li Zhao,et al. Exploring DRAM cache architectures for CMP server platforms , 2007, 2007 25th International Conference on Computer Design.
[12] Babak Falsafi,et al. NOC-Out: Microarchitecting a Scale-Out Processor , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[13] Yangdong Deng,et al. Interconnect characteristics of 2.5-D system integration scheme , 2001, ISPD '01.
[14] Thomas F. Wenisch,et al. Spatial Memory Streaming , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).
[15] Mikko H. Lipasti,et al. Improving multiprocessor performance with coarse-grain coherence tracking , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[16] Alan Gara,et al. Exploiting eDRAM bandwidth with data prefetching: simulation and measurements , 2007, 2007 25th International Conference on Computer Design.
[17] Martin Burtscher,et al. Bridging the processor-memory performance gap with 3D IC technology , 2005, IEEE Design & Test of Computers.
[18] Zhe Wang,et al. Improving writeback efficiency with decoupled last-write prediction , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).
[19] Rajeev Balasubramonian,et al. Optimizing communication and capacity in a 3D stacked reconfigurable cache hierarchy , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.
[20] Babak Falsafi,et al. Clearing the clouds: a study of emerging scale-out workloads on modern hardware , 2012, ASPLOS XVII.
[21] Hsien-Hsin S. Lee,et al. An optimized 3D-stacked memory architecture by exploiting excessive, high-density TSV bandwidth , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.
[22] Wei-Fen Lin,et al. Filtering superfluous prefetches using density vectors , 2001, Proceedings 2001 IEEE International Conference on Computer Design: VLSI in Computers and Processors. ICCD 2001.
[23] Krisztián Flautner,et al. PicoServer: using 3D stacking technology to enable a compact energy efficient chip multiprocessor , 2006, ASPLOS XII.
[24] Babak Falsafi,et al. Accurate and complexity-effective spatial pattern prediction , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).
[25] Doe Hyun Yoon,et al. The dynamic granularity memory system , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).
[26] Thomas F. Wenisch,et al. SimFlex: Statistical Sampling of Computer System Simulation , 2006, IEEE Micro.
[27] Babak Falsafi,et al. Reactive NUCA: near-optimal block placement and replication in distributed caches , 2009, ISCA '09.
[28] Yuan Xie,et al. Simple but Effective Heterogeneous Main Memory with On-Chip Memory Controller Support , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[29] Sandhya Dwarkadas,et al. Amoeba-Cache: Adaptive Blocks for Eliminating Waste in the Memory Hierarchy , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[30] Babak Falsafi,et al. Dead-block prediction & dead-block correlating prefetchers , 2001, ISCA 2001.
[31] Babak Falsafi,et al. Scale-out processors , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).
[32] Mark D. Hill,et al. Efficiently enabling conventional block sizes for very large die-stacked DRAM caches , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[33] Daniel A. Jiménez,et al. Reducing network-on-chip energy consumption through spatial locality speculation , 2011, Proceedings of the Fifth ACM/IEEE International Symposium.
[34] Sanjeev Kumar,et al. Exploiting spatial locality in data caches using spatial footprints , 1998, ISCA.
[35] Roland E. Wunderlich,et al. SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..
[36] Giovanni De Micheli,et al. CCNoC: Specializing On-Chip Interconnects for Energy Efficiency in Cache-Coherent Servers , 2012, 2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip.
[37] Babak Falsafi,et al. Toward Dark Silicon in Servers , 2011, IEEE Micro.
[38] Bruce Jacob,et al. DRAMSim2: A Cycle Accurate Memory System Simulator , 2011, IEEE Computer Architecture Letters.
[39] Gabriel H. Loh,et al. 3D-Stacked Memory Architectures for Multi-core Processors , 2008, 2008 International Symposium on Computer Architecture.
[40] A. Seznec,et al. Decoupled sectored caches: conciliating low tag implementation cost and low miss ratio , 1994, Proceedings of 21 International Symposium on Computer Architecture.
[41] André Seznec,et al. Decoupled sectored caches: conciliating low tag implementation cost , 1994, ISCA '94.