CLR-DRAM: A Low-Cost DRAM Architecture Enabling Dynamic Capacity-Latency Trade-Off

DRAM is the prevalent main memory technology, but its long access latency can limit the performance of many workloads. Although prior works provide DRAM designs that reduce DRAM access latency, their reduced storage capacities hinder the performance of workloads that need large memory capacity. Because the capacity-latency trade-off is fixed at design time, previous works cannot achieve maximum performance under very different and dynamic workload demands.This paper proposes Capacity-Latency-Reconfigurable DRAM (CLR-DRAM), a new DRAM architecture that enables dynamic capacity-latency trade-off at low cost. CLR-DRAM allows dynamic reconfiguration of any DRAM row to switch between two operating modes: 1) max-capacity mode, where every DRAM cell operates individually to achieve approximately the same storage density as a density-optimized commodity DRAM chip and 2) high-performance mode, where two adjacent DRAM cells in a DRAM row and their sense amplifiers are coupled to operate as a single low-latency logical cell driven by a single logical sense amplifier.We implement CLR-DRAM by adding isolation transistors in each DRAM subarray. Our evaluations show that CLR-DRAM can improve system performance and DRAM energy consumption by 18.6% and 29.7% on average with four-core multiprogrammed workloads. We believe that CLR-DRAM opens new research directions for a system to adapt to the diverse and dynamically changing memory capacity and access latency demands of workloads.

[1]  Rachata Ausavarungnirun,et al.  The Virtual Block Interface: A Flexible Alternative to the Conventional Virtual Memory Framework , 2020, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).

[2]  Onur Mutlu,et al.  Understanding Latency Variation in Modern DRAM Chips: Experimental Characterization, Analysis, and Optimization , 2016, SIGMETRICS.

[3]  Sally A. McKee,et al.  Hitting the memory wall: implications of the obvious , 1995, CARN.

[4]  Onur Mutlu,et al.  In-DRAM Bulk Bitwise Execution Engine , 2019, ArXiv.

[5]  Yuan Xie,et al.  ProactiveDRAM: A DRAM-initiated retention management scheme , 2014, 2014 IEEE 32nd International Conference on Computer Design (ICCD).

[6]  Hideto Hidaka,et al.  The cache DRAM architecture: a DRAM with an on-chip cache memory , 1990, IEEE Micro.

[7]  Mor Harchol-Balter,et al.  Thread Cluster Memory Scheduling , 2011, IEEE Micro.

[8]  Jun Yang,et al.  Restore truncation for performance improvement in future DRAM systems , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[9]  Onur Mutlu,et al.  Detecting and Mitigating Data-Dependent DRAM Failures by Exploiting Current Memory Content , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[10]  Vijay Janapa Reddi,et al.  PIN: a binary instrumentation tool for computer architecture research and education , 2004, WCAE '04.

[11]  Stijn Eyerman,et al.  System-Level Performance Metrics for Multiprogram Workloads , 2008, IEEE Micro.

[12]  Rachata Ausavarungnirun,et al.  Google Workloads for Consumer Devices: Mitigating Data Movement Bottlenecks , 2018, ASPLOS.

[13]  Onur Mutlu,et al.  Runahead execution: an alternative to very large instruction windows for out-of-order processors , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[14]  Yang Wang,et al.  How Does the Workload Look Like in Production Cloud? Analysis and Clustering of Workloads on Alibaba Cluster Trace , 2018, 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS).

[15]  O Seongil,et al.  Row-buffer decoupling: A case for low-latency DRAM microarchitecture , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[16]  Mayu Aoki,et al.  A precise on-chip voltage generator for a gigascale DRAM with a negative word-line scheme , 1999 .

[17]  Wongyu Shin,et al.  Multiple Clone Row DRAM: A low latency and area optimized DRAM , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[18]  Yun Chen,et al.  Supporting Differentiated Services in Computers via Programmable Architecture for Resourcing-on-Demand (PARD) , 2015, ASPLOS.

[19]  Onur Mutlu,et al.  Research Problems and Opportunities in Memory Systems , 2014, Supercomput. Front. Innov..

[20]  Onur Mutlu,et al.  Processing-in-memory: A workload-driven perspective , 2019, IBM J. Res. Dev..

[21]  Maurice V. Wilkes,et al.  The memory gap and the future of high performance memories , 2001, CARN.

[22]  Onur Mutlu,et al.  Understanding Reduced-Voltage Operation in Modern DRAM Devices , 2017, Proc. ACM Meas. Anal. Comput. Syst..

[23]  Stefan Mangard,et al.  DRAMA: Exploiting DRAM Addressing for Cross-CPU Attacks , 2015, USENIX Security Symposium.

[24]  Y. Nakagome,et al.  Trends in low-power RAM circuit technologies , 1995 .

[25]  Ninghui Sun,et al.  Labeled RISC-V: A New Perspective on Software-Defined Architecture , 2017 .

[26]  Onur Mutlu,et al.  Memory scaling: A systems architecture perspective , 2013, 2013 5th IEEE International Memory Workshop.

[27]  Feng Lin,et al.  DRAM Circuit Design: Fundamental and High-Speed Topics , 2007 .

[28]  Onur Mutlu,et al.  CROW: A Low-Cost Substrate for Improving DRAM Performance, Energy Efficiency, and Reliability , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).

[29]  Onur Mutlu,et al.  ChargeCache: Reducing DRAM latency by exploiting row access locality , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[30]  Onur Mutlu,et al.  Low-Cost Inter-Linked Subarrays (LISA): Enabling fast inter-subarray data movement in DRAM , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[31]  Jongmoo Choi,et al.  Decoupled Direct Memory Access: Isolating CPU and IO Traffic by Leveraging a Dual-Data-Port DRAM , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).

[32]  Sheng Di,et al.  Characterization and Comparison of Cloud versus Grid Workloads , 2012, 2012 IEEE International Conference on Cluster Computing.

[33]  Young-Hyun Jun,et al.  Simultaneous Reverse Body and Negative Word-Line Biasing Control Scheme for Leakage Reduction of DRAM , 2011, IEEE Journal of Solid-State Circuits.

[34]  Onur Mutlu,et al.  The efficacy of error mitigation techniques for DRAM retention failures: a comparative experimental study , 2014, SIGMETRICS '14.

[35]  Kiyoo Itoh,et al.  Long-Retention-Time, High-Speed DRAM Array with 12-F2 Twin Cell for Sub 1-V Operation , 2007, IEICE Trans. Electron..

[36]  Richard Veras,et al.  RAIDR: Retention-aware intelligent DRAM refresh , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[37]  Hiroyuki Kobayashi,et al.  Fast cycle RAM (FCRAM); a 20-ns random row access, pipe-lined operating DRAM , 1998, 1998 Symposium on VLSI Circuits. Digest of Technical Papers (Cat. No.98CH36215).

[38]  Nick Knupffer Intel Corporation , 2018, The Grants Register 2019.

[39]  Onur Mutlu,et al.  SMASH: Co-designing Software Compression and Hardware-Accelerated Indexing for Efficient Sparse Matrix Operations , 2019, MICRO.

[40]  Onur Mutlu,et al.  A case for exploiting subarray-level parallelism (SALP) in DRAM , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[41]  Masashi Horiguchi,et al.  Nanoscale Memory Repair , 2011, Integrated Circuits and Systems.

[42]  Charles Reiss,et al.  Towards understanding heterogeneous clouds at scale : Google trace analysis , 2012 .

[43]  Onur Mutlu,et al.  VRL-DRAM: Improving DRAM Performance via Variable Refresh Latency , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[44]  Onur Mutlu,et al.  Simultaneous Multi-Layer Access , 2016, ACM Trans. Archit. Code Optim..

[45]  Onur Mutlu,et al.  AVATAR: A Variable-Retention-Time (VRT) Aware Refresh for DRAM Systems , 2015, 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[46]  Onur Mutlu,et al.  DSPatch: Dual Spatial Pattern Prefetcher , 2019, MICRO.

[47]  Kazuaki Murakami,et al.  Optimizing the DRAM refresh count for merged DRAM/logic LSIs , 1998, Proceedings. 1998 International Symposium on Low Power Electronics and Design (IEEE Cat. No.98TH8379).

[48]  Binoy Ravindran,et al.  Quantifying Memory Underutilization in HPC Systems and Using it to Improve Performance via Architecture Support , 2019, MICRO.

[49]  Kevin K. Chang,et al.  Understanding and Improving the Latency of DRAM-Based Memory Systems , 2017, ArXiv.

[50]  Wayne H. Wolf,et al.  MediaBench II video: Expediting the next generation of video systems research , 2009, Microprocess. Microsystems.

[51]  Björn Andersson,et al.  Coordinated Bank and Cache Coloring for Temporal Protection of Memory Accesses , 2013, 2013 IEEE 16th International Conference on Computational Science and Engineering.

[52]  Onur Mutlu,et al.  Memory Performance Attacks: Denial of Memory Service in Multi-Core Systems , 2007, USENIX Security Symposium.

[53]  Onur Mutlu,et al.  What Your DRAM Power Models Are Not Telling You: Lessons from a Detailed Experimental Study , 2018, SIGMETRICS.

[54]  Onur Mutlu,et al.  PARBOR: An Efficient System-Level Technique to Detect Data-Dependent Failures in DRAM , 2016, 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[55]  Onur Mutlu,et al.  Solar-DRAM: Reducing DRAM Access Latency by Exploiting the Variation in Local Bitlines , 2018, 2018 IEEE 36th International Conference on Computer Design (ICCD).

[56]  Onur Mutlu,et al.  The reach profiler (REAPER): Enabling the mitigation of DRAM retention failures via profiling at aggressive conditions , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[57]  O Seongil,et al.  Reducing memory access latency with asymmetric DRAM bank organizations , 2013, ISCA.

[58]  Onur Mutlu,et al.  An experimental study of data retention behavior in modern DRAM devices: implications for retention time profiling mechanisms , 2013, ISCA.

[59]  Onur Mutlu,et al.  Reducing DRAM Latency via Charge-Level-Aware Look-Ahead Partial Restoration , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[60]  Dae-Hyun Kim,et al.  ArchShield: architectural framework for assisting DRAM scaling by tolerating high error rates , 2013, ISCA.

[61]  Daehyun Kim,et al.  ECC-ASPIRIN: An ECC-assisted post-package repair scheme for aging errors in DRAMs , 2016, 2016 IEEE 34th VLSI Test Symposium (VTS).

[62]  Onur Mutlu,et al.  A Case for Memory Content-Based Detection and Mitigation of Data-Dependent Failures in DRAM , 2017, IEEE Computer Architecture Letters.

[63]  Dean M. Tullsen,et al.  Symbiotic jobscheduling for a simultaneous mutlithreading processor , 2000, SIGP.

[64]  Onur Mutlu,et al.  Self-Optimizing Memory Controllers: A Reinforcement Learning Approach , 2008, 2008 International Symposium on Computer Architecture.

[65]  Brad Calder,et al.  SimPoint 3.0: Faster and More Flexible Program Phase Analysis , 2005, J. Instr. Level Parallelism.

[66]  Onur Mutlu,et al.  Adaptive-latency DRAM: Optimizing DRAM timing for the common-case , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[67]  Tao Zhang,et al.  Half-DRAM: A high-bandwidth and low-power DRAM architecture from the rethinking of fine-grained activation , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[68]  Donghyuk Lee,et al.  Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity , 2016, ArXiv.

[69]  Gu-Yeon Wei,et al.  Profiling a warehouse-scale computer , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[70]  Rachata Ausavarungnirun,et al.  Reducing DRAM Latency by Exploiting Design-Induced Latency Variation in Modern DRAM Chips , 2016, ArXiv.

[72]  Bruce Jacob,et al.  Flexible auto-refresh: Enabling scalable and energy-efficient DRAM refresh reductions , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[73]  Onur Mutlu,et al.  Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[74]  Kunle Olukotun,et al.  The Future of Microprocessors , 2005, ACM Queue.

[75]  Eric Rotenberg,et al.  Retention-aware placement in DRAM (RAPID): software methods for quasi-non-volatile DRAM , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[76]  Rachata Ausavarungnirun,et al.  RowClone: Fast and energy-efficient in-DRAM bulk data copy and initialization , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[77]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[78]  Jin-Young Kim,et al.  An 8 Gb/s/pin 9.6 ns Row-Cycle 288 Mb Deca-Data Rate SDRAM With an I/O Error Detection Scheme , 2006, IEEE Journal of Solid-State Circuits.

[79]  Onur Mutlu,et al.  Ambit: In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[80]  David A. Wood,et al.  A comparative analysis of microarchitecture effects on CPU and GPU memory system behavior , 2014, 2014 IEEE International Symposium on Workload Characterization (IISWC).

[81]  Onur Mutlu,et al.  EDEN: Enabling Energy-Efficient, High-Performance Deep Neural Network Inference Using Approximate DRAM , 2019, MICRO.

[82]  Yusuf Leblebici,et al.  Subthreshold leakage reduction: A comparative study of SCL and CMOS design , 2009, 2009 IEEE International Symposium on Circuits and Systems.

[83]  David Roberts,et al.  Binary Star: Coordinated Reliability in Heterogeneous Memory Systems for High Performance and Scalability , 2019, MICRO.

[84]  Onur Mutlu,et al.  A Case for Richer Cross-Layer Abstractions: Bridging the Semantic Gap with Expressive Memory , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[85]  Kiyoo Itoh,et al.  A 0.5-V FD-SOI Twin-Cell DRAM with Offset-Free Dynamic-VT Sense Amplifiers , 2006, ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design.

[86]  Onur Mutlu,et al.  Improving DRAM performance by parallelizing refreshes with accesses , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[87]  Song Liu,et al.  Flikker: saving DRAM refresh-power through critical data partitioning , 2011, ASPLOS XVI.

[88]  Chia-Lin Yang,et al.  SECRET: Selective error correction for refresh energy reduction in DRAMs , 2012, 2012 IEEE 30th International Conference on Computer Design (ICCD).

[89]  Mor Harchol-Balter,et al.  ATLAS : A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers , 2010 .

[90]  Onur Mutlu,et al.  Tiered-latency DRAM: A low latency and low cost DRAM architecture , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[91]  Onur Mutlu,et al.  Demystifying Complex Workload-DRAM Interactions: An Experimental Study , 2019, SIGMETRICS.

[92]  Thomas Vogelsang,et al.  Understanding the Energy Consumption of Dynamic Random Access Memories , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[93]  Kee-Won Kwon,et al.  An 8Gb/s/pin 9.6ns Row-Cycle 288Mb Deca-Data Rate SDRAM with an I/O Error-Detection Scheme , 2006, 2006 IEEE International Solid State Circuits Conference - Digest of Technical Papers.

[94]  Onur Mutlu,et al.  Gather-Scatter DRAM: In-DRAM address translation to improve the spatial locality of non-unit strided accesses , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[95]  Onur Mutlu,et al.  Ramulator: A Fast and Extensible DRAM Simulator , 2016, IEEE Computer Architecture Letters.

[96]  Sally A. McKee,et al.  DTail: a flexible approach to DRAM refresh management , 2014, ICS '14.