Exploiting In-Memory Data Patterns for Performance Improvement on Crossbar Resistive Memory

Resistive memory (ReRAM) has emerged as a promising nonvolatile memory technology that may replace a significant portion of DRAM in future computer systems. ReRAM has many advantages, such as high density, low standby power, and good scalability. When adopting crossbar architecture, ReRAM cell can achieve the smallest theoretical size in fabrication, which is ideal for constructing dense memory with large capacity. However, crossbar cell structure suffers from a variety of reliability issues, which come from large voltage drops on long wires. To ensure operation reliability, ReRAM writes conservatively use the worst-case access latency of all cells in ReRAM arrays, which leads to significant performance degradation and dynamic energy waste. In this article, we study the correlation between the ReRAM cell switching latency and the number of cells in low-resistance state (LRS) along bitlines, and propose to dynamically speed up write operations based on bitline data patterns, i.e., the number of LRS cells presented in bitlines. We leverage the intrinsic in-memory processing capability of ReRAM crossbar and propose a low-overhead runtime profiler that effectively tracks the data patterns in different bitlines. To achieve further write latency reduction, we employ data compression and row address dependent memory data layout to reduce the numbers of LRS cells on bitlines. Moreover, we further present two optimization techniques, i.e., selective profiling and fine-grained profiling, to mitigate energy overhead brought by bitline data patterns tracking. The experimental results show that, on average, our design improves system performance by 20.5% and 14.2%, and reduces memory dynamic energy by 20.3% and 12.6%, compared to the baseline and the state-of-the-art crossbar design, respectively.

[1]  Dmitri B. Strukov,et al.  Mellow Writes: Extending Lifetime in Resistive Memories through Selective Slow Write Backs , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[2]  Vijayalakshmi Srinivasan,et al.  Scalable high performance main memory system using phase-change memory technology , 2009, ISCA '09.

[3]  Mahmut T. Kandemir,et al.  Evaluating STT-RAM as an energy-efficient main memory alternative , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[4]  Dejan S. Milojicic,et al.  Generalize or Die: Operating Systems Support for Memristor-Based Accelerators , 2017, 2017 IEEE International Conference on Rebooting Computing (ICRC).

[5]  Chenchen Liu,et al.  Rescuing memristor-based neuromorphic design with high defects , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[6]  Lei Zhao,et al.  Speeding up crossbar resistive memory by exploiting in-memory data patterns , 2017, 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[7]  Cong Xu,et al.  NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory , 2012, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[8]  Ad J. van de Goor,et al.  March tests for word-oriented memories , 1998, Proceedings Design, Automation and Test in Europe.

[9]  Yiran Chen,et al.  PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[10]  Jun Yang,et al.  A durable and energy efficient main memory using phase change memory technology , 2009, ISCA '09.

[11]  Kazuyuki Kouno,et al.  ReRAM Technologies for Embedded Memory and Further Applications , 2018, 2018 IEEE International Memory Workshop (IMW).

[12]  Shimeng Yu,et al.  Metal–Oxide RRAM , 2012, Proceedings of the IEEE.

[13]  John L. Henning SPEC CPU2006 benchmark descriptions , 2006, CARN.

[14]  Jun Yang,et al.  Constructing fast and energy efficient 1TnR based ReRAM crossbar memory , 2017, 2017 18th International Symposium on Quality Electronic Design (ISQED).

[15]  Yang Zhang,et al.  CACF: A Novel Circuit Architecture Co-optimization Framework for Improving Performance, Reliability and Energy of ReRAM-based Main Memory System , 2018, ACM Trans. Archit. Code Optim..

[16]  Engin Ipek,et al.  Making Memristive Neural Network Accelerators Reliable , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[17]  Tao Zhang,et al.  PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[18]  Rajeev Balasubramonian,et al.  Improving memristor memory with sneak current sharing , 2015, 2015 33rd IEEE International Conference on Computer Design (ICCD).

[19]  Bing Chen,et al.  RRAM Crossbar Array With Cell Selection Device: A Device and Circuit Interaction Study , 2013, IEEE Transactions on Electron Devices.

[20]  Catherine Graves,et al.  Dot-product engine for neuromorphic computing: Programming 1T1M crossbar to accelerate matrix-vector multiplication , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[21]  Yang Zhang,et al.  A novel ReRAM-based main memory structure for optimizing access latency and reliability , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[22]  Kuk-Hwan Kim,et al.  Crossbar RRAM Arrays: Selector Device Requirements During Read Operation , 2014, IEEE Transactions on Electron Devices.

[23]  Hyunjin Lee,et al.  Flip-N-Write: A simple deterministic technique to improve PRAM write performance, energy and endurance , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[24]  Jiale Liang,et al.  Cross-Point Memory Array Without Cell Selectors—Device Characteristics and Data Storage Pattern Dependencies , 2010, IEEE Transactions on Electron Devices.

[25]  Hang Zhang,et al.  Leader: Accelerating ReRAM-based main memory by leveraging access latency discrepancy in crossbar arrays , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[26]  Jun Yang,et al.  Phase-Change Technology and the Future of Main Memory , 2010, IEEE Micro.

[27]  Donald Yeung,et al.  BioBench: A Benchmark Suite of Bioinformatics Applications , 2005, IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005..

[28]  Jun Yang,et al.  Improving write operations in MLC phase change memory , 2012, IEEE International Symposium on High-Performance Comp Architecture.

[29]  Jun Zhou,et al.  A System-Level Simulator for RRAM-Based Neuromorphic Computing Chips , 2019, ACM Trans. Archit. Code Optim..

[30]  Miao Hu,et al.  ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[31]  Jun Yang,et al.  Wear Leveling for Crossbar Resistive Memory , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[32]  Engin Ipek,et al.  Enabling Scientific Computing on Memristive Accelerators , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[33]  Chaitali Chakrabarti,et al.  Programming strategies to improve energy efficiency and reliability of ReRAM memory systems , 2015, 2015 IEEE Workshop on Signal Processing Systems (SiPS).

[34]  Rami G. Melhem,et al.  Increasing PCM main memory lifetime , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[35]  Hongzhong Zheng,et al.  Co-Architecting Controllers and DRAM to Enhance DRAM Process Scaling , 2014 .

[36]  David A. Wood,et al.  Frequent Pattern Compression: A Significance-Based Compression Scheme for L2 Caches , 2004 .

[37]  Tao Li,et al.  A novel ReRAM-based processing-in-memory architecture for graph computing , 2017, 2017 IEEE 6th Non-Volatile Memory Systems and Applications Symposium (NVMSA).

[38]  Yusuf Leblebici,et al.  A 3.1 mW 8b 1.2 GS/s Single-Channel Asynchronous SAR ADC With Alternate Comparators for Enhanced Speed in 32 nm Digital SOI CMOS , 2013, IEEE Journal of Solid-State Circuits.

[39]  Mohammad Arjomand,et al.  Exploring the impact of memory block permutation on performance of a crossbar ReRAM main memory , 2017, 2017 IEEE International Symposium on Workload Characterization (IISWC).

[40]  Youtao Zhang,et al.  DrMP: Mixed Precision-Aware DRAM for High Performance Approximate and Precise Computing , 2017, 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[41]  Scott A. Mahlke,et al.  In-Memory Data Parallel Processor , 2018, ASPLOS.

[42]  Jeffrey S. Vetter,et al.  Addressing Read-Disturbance Issue in STT-RAM by Data Compression and Selective Duplication , 2017, IEEE Computer Architecture Letters.

[43]  Tao Zhang,et al.  Overcoming the challenges of crossbar resistive memory architectures , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[44]  O. Richard,et al.  10×10nm2 Hf/HfOx crossbar resistive RAM with excellent performance, reliability and low-energy operation , 2011, 2011 International Electron Devices Meeting.

[45]  Vijayalakshmi Srinivasan,et al.  Enhancing lifetime and security of PCM-based Main Memory with Start-Gap Wear Leveling , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[46]  H.-S. Philip Wong,et al.  Modeling and design optimization of ReRAM , 2015, The 20th Asia and South Pacific Design Automation Conference.

[47]  R. Waser,et al.  A Novel Reference Scheme for Reading Passive Resistive Crossbar Memories , 2006, IEEE Transactions on Nanotechnology.

[48]  Li Jiang,et al.  HUBPA: High Utilization Bidirectional Pipeline Architecture for Neuromorphic Computing , 2019, 2019 24th Asia and South Pacific Design Automation Conference (ASP-DAC).

[49]  Shimeng Yu,et al.  Verilog-A compact model for oxide-based resistive random access memory (RRAM) , 2014, 2014 International Conference on Simulation of Semiconductor Processes and Devices (SISPAD).

[50]  Danghui Wang,et al.  Improving read performance of STT-MRAM based main memories through Smash Read and Flexible Read , 2016, 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC).

[51]  Sung-Mo Kang,et al.  Analysis of Passive Memristive Devices Array: Data-Dependent Statistical Model and Self-Adaptable Sense Resistance for RRAMs , 2012, Proceedings of the IEEE.

[52]  Yiran Chen,et al.  Looking Ahead for Resistive Memory Technology: A broad perspective on ReRAM technology for future storage and computing , 2017, IEEE Consumer Electronics Magazine.

[53]  Gokhan Memik,et al.  Thermal-aware Optimizations of ReRAM-based Neuromorphic Computing Systems , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[54]  Meng-Fan Chang,et al.  Challenges and Circuit Techniques for Energy-Efficient On-Chip Nonvolatile Memory Using Memristive Devices , 2015, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[55]  Yang Zhang,et al.  DAWS: Exploiting Crossbar Characteristics for Improving Write Performance of High Density Resistive Memory , 2017, 2017 IEEE International Conference on Computer Design (ICCD).

[56]  Yao Wang,et al.  Comprehensive Sensing Current Analysis and Its Guideline for the Worst-Case Scenario of RRAM Read Operation , 2018, Electronics.

[57]  Youtao Zhang,et al.  Read Error Resilient MLC STT-MRAM Based Last Level Cache , 2017, 2017 IEEE International Conference on Computer Design (ICCD).

[58]  Tuo-Hung Hou,et al.  One selector-one resistor (1S1R) crossbar array for high-density flexible memory applications , 2011, 2011 International Electron Devices Meeting.

[59]  Rajeev Balasubramonian,et al.  Newton: Gravitating Towards the Physical Limits of Crossbar Acceleration , 2018, IEEE Micro.

[60]  Christian Bienia,et al.  PARSEC 2.0: A New Benchmark Suite for Chip-Multiprocessors , 2009 .

[61]  Engin Ipek,et al.  Memristive Boltzmann machine: A hardware accelerator for combinatorial optimization and deep learning , 2017, 2017 Fifth Berkeley Symposium on Energy Efficient Electronic Systems & Steep Transistors Workshop (E3S).

[62]  Bruce R. Childers,et al.  COMeT+: Continuous Online Memory Testing with Multi-Threading Extension , 2014, IEEE Transactions on Computers.

[63]  Lide Duan,et al.  RAPS: Restore-Aware Policy Selection for STT-MRAM-Based Main Memory under Read Disturbance , 2017, 2017 IEEE International Conference on Computer Design (ICCD).

[64]  Yifeng Zhu,et al.  Accelerating write by exploiting PCM asymmetries , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[65]  Hao Yu,et al.  An energy-efficient and high-throughput bitwise CNN on sneak-path-free digital ReRAM crossbar , 2017, 2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[66]  Ligang Gao,et al.  Analog-input analog-weight dot-product operation with Ag/a-Si/Pt memristive devices , 2012, 2012 IEEE/IFIP 20th International Conference on VLSI and System-on-Chip (VLSI-SoC).