论文信息 - Monarch: A Durable Polymorphic Memory For Data Intensive Applications

Monarch: A Durable Polymorphic Memory For Data Intensive Applications

3D die stacking has often been proposed to build large-scale DRAM-based caches. Unfortunately, the power and performance overheads of DRAM limit the efficiency of high-bandwidth memories. Also, DRAM is facing serious scalability challenges that make alternative technologies more appealing. This paper examines Monarch, a resistive 3D stacked memory based on a novel reconfigurable crosspoint array called XAM. The XAM array is capable of switching between random access and content-addressable modes, which enables Monarch (i) to better utilize the in-package bandwidth and (ii) to satisfy both the random access memory and associative search requirements of various applications. Moreover, the Monarch controller ensures a given target lifetime for the resistive stack. Our simulation results on a set of parallel memory-intensive applications indicate that Monarch outperforms an ideal DRAM caching by 1.21× on average. For in-memory hash table and string matching workloads, Monarch improves performance up to 12× over the conventional high bandwidth memories.

Mahdi Nazm Bojnordi | Ananth Krishna Prasad

[1] David H. Bailey,et al. The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[2] Mahdi Nazm Bojnordi,et al. R-Cache: A Highly Set-Associative In-Package Cache Using Memristive Arrays , 2018, 2018 IEEE 36th International Conference on Computer Design (ICCD).

[3] F. Zeng,et al. Recent progress in resistive random access memories: Materials, switching mechanisms, and performance , 2014 .

[4] Tao Zhang,et al. RC-NVM: Enabling Symmetric Row and Column Memory Accesses for In-memory Databases , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[5] Tao Zhang,et al. Overcoming the challenges of crossbar resistive memory architectures , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[6] Cong Xu,et al. NVSim-CAM: A circuit-level simulator for emerging nonvolatile memory based Content-Addressable Memory , 2016, 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[7] Li Zhang,et al. NVMcached: An NVM-based Key-Value Cache , 2016, APSys.

[8] Jose Renau,et al. ESESC: A fast multicore simulator using Time-Based Sampling , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[9] Aamer Jaleel,et al. ACCORD: Enabling Associativity for Gigascale DRAM Caches by Coordinating Way-Install and Way-Prediction , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[10] Engin Ipek,et al. Memristive Boltzmann machine: A hardware accelerator for combinatorial optimization and deep learning , 2017, 2017 Fifth Berkeley Symposium on Energy Efficient Electronic Systems & Steep Transistors Workshop (E3S).

[11] Daniel Sánchez,et al. Jenga: Software-defined cache hierarchies , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[12] Randall L. Geiger,et al. VLSI Design Techniques for Analog and Digital Circuits , 1989 .

[13] Aamer Jaleel,et al. BEAR: Techniques for mitigating bandwidth bloat in gigascale DRAM caches , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[14] C. Zambelli,et al. Multilevel HfO2-based RRAM devices for low-power neuromorphic networks , 2019, APL Materials.

[15] Mark D. Hill,et al. Efficiently enabling conventional block sizes for very large die-stacked DRAM caches , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[16] Song Jiang,et al. Workload analysis of a large-scale key-value store , 2012, SIGMETRICS '12.

[17] Vijayalakshmi Srinivasan,et al. Enhancing lifetime and security of PCM-based Main Memory with Start-Gap Wear Leveling , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[18] Omer Khan,et al. CRONO: A Benchmark Suite for Multithreaded Graph Algorithms Executing on Futuristic Multicores , 2015, 2015 IEEE International Symposium on Workload Characterization.

[19] Gabriel H. Loh,et al. Fundamental Latency Trade-off in Architecting DRAM Caches: Outperforming Impractical SRAM-Tags with a Simple and Practical Design , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[20] Yannis Manolopoulos,et al. Advanced Database Indexing , 1999, Advances in Database Systems.

[21] Christoforos E. Kozyrakis,et al. Phoenix rebirth: Scalable MapReduce on a large-scale shared-memory system , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[22] Jing Li,et al. Liquid Silicon-Monona: A Reconfigurable Memory-Oriented Computing Fabric with Scalable Multi-Context Support , 2018, ASPLOS.

[23] Tao Jiang,et al. A Fast Algorithm for Approximate String Matching on Gene Sequences , 2005, CPM.

[24] Srinivas Devadas,et al. Banshee: Bandwidth-Efficient DRAM Caching via Software/Hardware Cooperation , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[25] William J. Dally,et al. Fine-Grained DRAM: Energy-Efficient DRAM for Extreme Bandwidth Systems , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[26] Jie Wu,et al. Write-Optimized and High-Performance Hashing Index Scheme for Persistent Memory , 2018, OSDI.

[27] Adam Silberstein,et al. Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[28] Rachata Ausavarungnirun,et al. Google Workloads for Consumer Devices: Mitigating Data Movement Bottlenecks , 2018, ASPLOS.

[29] Jung Ho Ahn,et al. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[30] Jin Xiong,et al. HiKV: A Hybrid Index Key-Value Store for DRAM-NVM Memory Systems , 2017, USENIX Annual Technical Conference.

[31] Avinash Sodani,et al. Knights landing (KNL): 2nd Generation Intel® Xeon Phi processor , 2015, 2015 IEEE Hot Chips 27 Symposium (HCS).

[32] Masoud Dehyadegari,et al. Memristive Data Ranking , 2021, 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA).

[33] Onur Mutlu,et al. Accelerating pointer chasing in 3D-stacked memory: Challenges, mechanisms, evaluation , 2016, 2016 IEEE 34th International Conference on Computer Design (ICCD).

[34] Xiao Liu,et al. Basic Performance Measurements of the Intel Optane DC Persistent Memory Module , 2019, ArXiv.

[35] Yu Cao,et al. New generation of predictive technology model for sub-45nm design exploration , 2006, 7th International Symposium on Quality Electronic Design (ISQED'06).

[36] Hongyu Miao,et al. StreamBox-HBM: Stream Analytics on High Bandwidth Hybrid Memory , 2019, ASPLOS.

[37] Cong Xu,et al. NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory , 2012, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.