论文信息 - Toward Efficient Programmer-Managed Two-Level Memory Hierarchies in Exascale Computers

Toward Efficient Programmer-Managed Two-Level Memory Hierarchies in Exascale Computers

Future exascale systems will require very aggressive memory systems simultaneously delivering huge storage capacities and multi-TB/s bandwidths. To achieve the bandwidth targets, in-package, die-stacked memory technologies will likely be necessary. However, these integrated memories do not provide enough capacity to achieve the overall per-node memory size requirements. As a result, conventional off-package memory (e.g., DIMMs) will still be needed. This creates a "two-level memory" (TLM) organization where a portion of the machine's memory space provides high bandwidth, and the remainder provides capacity at a lower level of performance. Effective use of such a heterogeneous memory organization may require the co-design of the software applications along with the advancements in memory architecture. In this paper, we explore the efficacy of programmer-driven approaches to managing a TLM system, using three Exascale proxy applications as case studies.

[1] Young-Hyun Jun,et al. A 1.2V 12.8GB/s 2Gb mobile Wide-I/O DRAM with 4×128 I/Os using TSV-based stacking , 2011, 2011 IEEE International Solid-State Circuits Conference.

[2] Maya Gokhale,et al. DI-MMAP: A High Performance Memory-Map Runtime for Data-Intensive Applications , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.

[3] Young-Hyun Jun,et al. A 1.2 V 12.8 GB/s 2 Gb Mobile Wide-I/O DRAM With 4 $\times$ 128 I/Os Using TSV Based Stacking , 2011, IEEE Journal of Solid-State Circuits.

[4] Gabriel H. Loh,et al. Challenges in Heterogeneous Die-Stacked and Off-Chip Memory Systems , 2012 .

[5] Jörg Henkel,et al. Simultaneously optimizing DRAM cache hit latency and miss rate via novel set mapping policies , 2013, 2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES).

[6] FalsafiBabak,et al. Die-stacked DRAM caches for servers , 2013 .

[7] R. Hornung,et al. HYDRODYNAMICS CHALLENGE PROBLEM , 2011 .

[8] Yan Solihin,et al. CHOP: Adaptive filter-based DRAM caching for CMP server platforms , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[9] Gabriel H. Loh,et al. Fundamental Latency Trade-off in Architecting DRAM Caches: Outperforming Impractical SRAM-Tags with a Simple and Practical Design , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[10] Gabriel H. Loh,et al. 3D-Stacked Memory Architectures for Multi-core Processors , 2008, 2008 International Symposium on Computer Architecture.

[11] Onur Mutlu,et al. Enabling Efficient and Scalable Hybrid Memories Using Fine-Granularity DRAM Cache Management , 2012, IEEE Computer Architecture Letters.

[12] Sung Kyu Lim,et al. A study of stacking limit and scaling in 3D ICs: an interconnect perspective , 2009, 2009 59th Electronic Components and Technology Conference.

[13] Hsien-Hsin S. Lee,et al. Designing 3D test wrappers for pre-bond and post-bond test of 3D embedded cores , 2011, 2011 IEEE 29th International Conference on Computer Design (ICCD).

[14] Kesheng Wu,et al. Scientific Discovery at the Exascale , 2011 .

[15] Natalie D. Enright Jerger,et al. A dual grain hit-miss detector for large Die-Stacked DRAM caches , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[16] Gabriel H. Loh,et al. A Mostly-Clean DRAM Cache for Effective Hit Speculation and Self-Balancing Dispatch , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[17] Li Zhao,et al. Exploring DRAM cache architectures for CMP server platforms , 2007, 2007 25th International Conference on Computer Design.

[18] Babak Falsafi,et al. Die-stacked DRAM caches for servers: hit ratio, latency, or bandwidth? have it all with footprint cache , 2013, ISCA.

[19] Lei Jiang,et al. Die Stacking (3D) Microarchitecture , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).