Tiered-ReRAM: A Low Latency and Energy Efficient TLC Crossbar ReRAM Architecture

Resistive Memory (ReRAM) is promising to be used as high density storage-class memory by employing Triple-Level Cell (TLC) and crossbar structures. However, TLC crossbar ReRAM suffers from high write latency and energy due to the IR drop issue and the iterative program-and-verify procedure. In this paper, we propose Tiered-ReRAM architecture to overcome the challenges of TLC crossbar ReRAM. The proposed Tiered-ReRAM consists of three components, namely Tiered-crossbar design, Compression-based Incomplete Data Mapping (CIDM), and Compression-based Flip Scheme (CFS). Specifically, based on the observation that the magnitude of IR drops is primarily determined by the long length of bitlines in Double-Sided Ground Biasing (DSGB) crossbar arrays, Tiered-crossbar design splits each long bitline into the near and far segments by an isolation transistor, allowing the near segment to be accessed with decreased latency and energy. Moreover, in the near segments, CIDM dynamically selects the most appropriate IDM for each cache line according to the saved space by compression, which further reduces the write latency and energy with insignificant space overhead. In addition, in the far segments, CFS dynamically selects the most appropriate flip scheme for each cache line, which ensures more high resistance cells written into crossbar arrays and effectively reduces the leakage energy. For each compressed cache line, the selected IDM or flip scheme is applied on the condition that the total encoded data size will never exceed the original cache line size. The experimental results show that, on average, Tiered-ReRAM can improve the system performance by 30.5%, reduce the write latency by 35.2%, decrease the read latency by 26.1%, and reduce the energy consumption by 35.6%, compared to an aggressive baseline.

[1]  裕幸 飯田,et al.  International Technology Roadmap for Semiconductors 2003の要求清浄度について - シリコンウエハ表面と雰囲気環境に要求される清浄度, 分析方法の現状について - , 2004 .

[2]  David A. Wood,et al.  Frequent Pattern Compression: A Significance-Based Compression Scheme for L2 Caches , 2004 .

[3]  S. Narasimha,et al.  High Performance 45-nm SOI Technology with Enhanced Strain, Porous Low-k BEOL, and Immersion Lithography , 2006, 2006 International Electron Devices Meeting.

[4]  Hyunjin Lee,et al.  Flip-N-Write: A simple deterministic technique to improve PRAM write performance, energy and endurance , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[5]  Onur Mutlu,et al.  Architecting phase change memory as a scalable dram alternative , 2009, ISCA '09.

[6]  Frederick T. Chen,et al.  Evidence and solution of over-RESET problem for HfOX based resistive memory with sub-ns switching speed and high endurance , 2010, 2010 International Electron Devices Meeting.

[7]  Bradford M. Beckmann,et al.  The gem5 simulator , 2011, CARN.

[8]  Heng-Yuan Lee,et al.  A 4Mb embedded SLC resistive-RAM macro with 7.2ns read-write random-access time and 160ns MLC-access capability , 2011, 2011 IEEE International Solid-State Circuits Conference.

[9]  Onur Mutlu,et al.  Base-delta-immediate compression: Practical data compression for on-chip caches , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[10]  Cong Xu,et al.  NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory , 2012, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[11]  K. Gopalakrishnan,et al.  Large-scale (512kbit) integration of multilayer-ready access-devices based on mixed-ionic-electronic-conduction (MIEC) at 100% yield , 2012, 2012 Symposium on VLSI Technology (VLSIT).

[12]  Matthew Poremba,et al.  NVMain: An Architectural-Level Main Memory Simulator for Emerging Non-volatile Memories , 2012, 2012 IEEE Computer Society Annual Symposium on VLSI.

[13]  Onur Mutlu,et al.  A case for exploiting subarray-level parallelism (SALP) in DRAM , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[14]  A. Robert Calderbank,et al.  Coset coding to extend the lifetime of memory , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[15]  Norman P. Jouppi,et al.  Understanding the trade-offs in multi-level cell ReRAM memory design , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[16]  Onur Mutlu,et al.  Tiered-latency DRAM: A low latency and low cost DRAM architecture , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[17]  O Seongil,et al.  Reducing memory access latency with asymmetric DRAM bank organizations , 2013, ISCA.

[18]  Bing Chen,et al.  RRAM Crossbar Array With Cell Selection Device: A Device and Circuit Interaction Study , 2013, IEEE Transactions on Electron Devices.

[19]  Mahmut T. Kandemir,et al.  Evaluating STT-RAM as an energy-efficient main memory alternative , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[20]  Cong Xu,et al.  Low power multi-level-cell resistive memory design with incomplete data mapping , 2013, 2013 IEEE 31st International Conference on Computer Design (ICCD).

[21]  Onur Mutlu,et al.  Linearly compressed pages: A low-complexity, low-latency main memory compression framework , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[22]  Onur Mutlu,et al.  Adaptive-latency DRAM: Optimizing DRAM timing for the common-case , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[23]  Tao Zhang,et al.  Overcoming the challenges of crossbar resistive memory architectures , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[24]  Rami G. Melhem,et al.  CAFO: Cost aware flip optimization for asymmetric memories , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[25]  Rajeev Balasubramonian,et al.  Improving memristor memory with sneak current sharing , 2015, 2015 33rd IEEE International Conference on Computer Design (ICCD).

[26]  Hang Zhang,et al.  Leader: Accelerating ReRAM-based main memory by leveraging access latency discrepancy in crossbar arrays , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[27]  Mattan Erez,et al.  Bit-Plane Compression: Transforming Data for Better Compression in Many-Core Architectures , 2016, ISCA.

[28]  Onur Mutlu,et al.  A case for toggle-aware compression for GPU systems , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[29]  Alaa R. Alameldeen,et al.  Base-Victim Compression: An Opportunistic Cache Compression Architecture , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[30]  Amirali Ghofrani,et al.  A low-power hybrid reconfigurable architecture for resistive random-access memories , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[31]  Kartik Mohanram,et al.  CompEx: Compression-expansion coding for energy, latency, and lifetime improvements in MLC/TLC NVM , 2016, HPCA.

[32]  Onur Mutlu,et al.  Understanding Latency Variation in Modern DRAM Chips: Experimental Characterization, Analysis, and Optimization , 2016, SIGMETRICS.

[33]  Yang Zhang,et al.  A novel ReRAM-based main memory structure for optimizing access latency and reliability , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[34]  Yang Zhang,et al.  DAWS: Exploiting Crossbar Characteristics for Improving Write Performance of High Density Resistive Memory , 2017, 2017 IEEE International Conference on Computer Design (ICCD).

[35]  Gennady Pekhimenko,et al.  Design-Induced Latency Variation in Modern DRAM Chips , 2016, Proc. ACM Meas. Anal. Comput. Syst..

[36]  Jun Yang,et al.  Constructing fast and energy efficient 1TnR based ReRAM crossbar memory , 2017, 2017 18th International Symposium on Quality Electronic Design (ISQED).

[37]  Lei Zhao,et al.  Speeding up crossbar resistive memory by exploiting in-memory data patterns , 2017, 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[38]  Yang Zhang,et al.  CACF: A Novel Circuit Architecture Co-optimization Framework for Improving Performance, Reliability and Energy of ReRAM-based Main Memory System , 2018, ACM Trans. Archit. Code Optim..

[39]  Dan Feng,et al.  Asymmetric-ReRAM: A Low Latency and High Reliability Crossbar Resistive Memory Architecture , 2018, 2018 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom).