STFL-DDR: Improving the Energy-Efficiency of Memory Interface

Power dissipation is a significant problem limiting the performance of today's computer systems. One of the main contributors to power consumption in microprocessors is data movement in cache and memory interface. Several solutions such as low power interconnects, energy-aware data encoding, and low power signaling have been proposed to mitigate this problem. Almost all of these techniques result in a significant system performance degradation. This article examines the application of a novel technique, called STFL-DDR, for hybrid signaling on low-power DRAM interface. To keep the power consumption low, STFL-DDR employs a high-performance clock rate for transferring data on low power wires. To avoid any signal deterioration, STFL-DDR employs data encoding/decoding to prevent each wire from switching in any two consecutive cycles. STFL-DDR creates new opportunities for optimizing the energy-efficiency of DRAM systems. We compare the efficiency of STFL-DDR with the state-of-the-art methods by simulating a mix of 12 parallel benchmark applications on a muticore system. Our simulation results indicate that STFL can reduce the energy consumption of a contemporary DRAM interface by 17 percent as compared to an LPDDR baseline while achieving the throughput of a high-performance DRAM. Applying STFL to both last level cache and DRAM interface results in improving the system energy, energy-delay product, and performance by 8, 15, and 9 percent respectively. Compared with a high-performance memory interface, STFL improves the system energy and energy-delay product by 25 and 75 percent, while reaching 98 percent of the average performance of the high-performance system.

[1]  Engin Ipek,et al.  PARDIS: A programmable memory controller for the DDRx interfacing standards , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[2]  Yu Cao,et al.  New generation of predictive technology model for sub-45nm design exploration , 2006, 7th International Symposium on Quality Electronic Design (ISQED'06).

[3]  Carlos Sánchez-Azqueta,et al.  An Adaptive Bitrate Clock and Data Recovery Circuit for Communication Signal Analyzers , 2017, IEEE Transactions on Instrumentation and Measurement.

[4]  Engin Ipek,et al.  More is less: Improving the energy efficiency of data movement via opportunistic use of sparse codes , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[5]  Karthikeyan Sankaralingam,et al.  Dark Silicon and the End of Multicore Scaling , 2012, IEEE Micro.

[6]  Rajeev Balasubramonian,et al.  Non-uniform power access in large caches with low-swing wires , 2009, 2009 International Conference on High Performance Computing (HiPC).

[7]  Ismail Akturk,et al.  AMNESIAC: Amnesic Automatic Computer , 2017, ASPLOS.

[8]  Hyunjin Lee,et al.  Flip-N-Write: A simple deterministic technique to improve PRAM write performance, energy and endurance , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[9]  Woo-Jin Lee,et al.  An 80 nm 4 Gb/s/pin 32 bit 512 Mb GDDR4 Graphics DRAM With Low Power and Low Noise Data Bus Inversion , 2008, IEEE Journal of Solid-State Circuits.

[10]  L. Ravezzi,et al.  Data Recovery and Retiming for the Fully Buffered DIMM 4.8Gb/s Serial Links , 2006, 2006 IEEE International Solid State Circuits Conference - Digest of Technical Papers.

[11]  Engin Ipek,et al.  Reducing data movement energy via online data clustering and encoding , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[12]  Norman P. Jouppi,et al.  CACTI 6.0: A Tool to Model Large Caches , 2009 .

[13]  Jose Renau,et al.  ESESC: A fast multicore simulator using Time-Based Sampling , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[14]  Timothy M. Hollis Data Bus Inversion in High-Speed Memory Applications , 2009, IEEE Transactions on Circuits and Systems II: Express Briefs.

[15]  Engin Ipek,et al.  DESC: Energy-efficient data exchange using synchronized counters , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[16]  Dan Oh,et al.  A jitter equalization technique for minimizing supply noise induced jitter in high speed serial links , 2014, 2014 IEEE International Symposium on Electromagnetic Compatibility (EMC).

[17]  Mahdi Nazm Bojnordi,et al.  Adaptive Time-based Encoding for Energy-Efficient Large Cache Architectures , 2017, E2SC@SC.

[18]  John Shalf,et al.  DOE Advanced Scientific Computing Advisory Subcommittee (ASCAC) Report: Top Ten Exascale Research Challenges , 2014 .

[19]  Rami G. Melhem,et al.  CAFO: Cost aware flip optimization for asymmetric memories , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[20]  Suhwan Kim,et al.  A Fast-Locking CDR Circuit with an Autonomously Reconfigurable Charge Pump and Loop Filter , 2006, 2006 IEEE Asian Solid-State Circuits Conference.

[21]  Hyoung-Joo Kim,et al.  A 3.2 Gbps/pin 8 Gbit 1.0 V LPDDR4 SDRAM With Integrated ECC Engine for Sub-1 V DRAM Core Operation , 2015, IEEE Journal of Solid-State Circuits.

[22]  Hankyu Chi,et al.  A 1 . 74 mW / GHz 0 . 11-2 . 5 GHz Fast-Locking , Jitter-Reducing , 180 ° Phase-Shift Digital DLL with a Window Phase Detector for LPDDR 4 Memory Controllers , 2015 .

[23]  B. Razavi,et al.  Challenges in the design of high-speed clock and data recovery circuits , 2002, IEEE Commun. Mag..

[24]  J. Lee,et al.  A 40 Gb/s clock and data recovery circuit in 0.18 /spl mu/m CMOS technology , 2003, 2003 IEEE International Solid-State Circuits Conference, 2003. Digest of Technical Papers. ISSCC..

[25]  Christoforos E. Kozyrakis,et al.  Towards energy-proportional datacenter memory with mobile DRAM , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[26]  Vivienne Sze,et al.  Hardware for machine learning: Challenges and opportunities , 2017, 2017 IEEE Custom Integrated Circuits Conference (CICC).

[27]  Mircea R. Stan,et al.  Limited-weight codes for low-power I/O , 1994 .

[28]  Andrew B. Kahng,et al.  CACTI-IO: CACTI with off-chip power-area-timing models , 2012, 2012 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[29]  Chris Fallin,et al.  Memory power management via dynamic voltage/frequency scaling , 2011, ICAC '11.

[30]  Jae-Yoon Sim,et al.  A 650Mb/s-to-8Gb/s referenceless CDR circuit with automatic acquisition of data rate , 2009, 2009 IEEE International Solid-State Circuits Conference - Digest of Technical Papers.

[31]  Niladrish Chatterjee,et al.  Reducing Data Transfer Energy by Exploiting Similarity within a Data Transaction , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[32]  Christoforos E. Kozyrakis,et al.  Phoenix rebirth: Scalable MapReduce on a large-scale shared-memory system , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[33]  Wongyu Shin,et al.  Energy Efficient Data Encoding in DRAM Channels Exploiting Data Value Similarity , 2016, International Symposium on Computer Architecture.

[34]  David H. Bailey,et al.  The NAS parallel benchmarks summary and preliminary results , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[35]  Mircea R. Stan,et al.  Bus-invert coding for low-power I/O , 1995, IEEE Trans. Very Large Scale Integr. Syst..

[36]  Engin Ipek,et al.  Energy-efficient data movement with sparse transition encoding , 2015, 2015 33rd IEEE International Conference on Computer Design (ICCD).

[37]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[38]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[39]  Ismail Akturk,et al.  AMNESIAC : Amnesic Automatic Computer Trading Computation for Communication for Energy Efficiency ∗ , 2017 .

[40]  Mahdi Nazm Bojnordi,et al.  STFL: Energy-Efficient Data Movement with Slow Transition Fast Level Signaling , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).

[41]  Yi Lu,et al.  A 6.4-Gb/s Near-Ground Single-Ended Transceiver for Dual-Rank DIMM Memory Interface Systems , 2014, IEEE Journal of Solid-State Circuits.

[42]  小沢诚一,et al.  Transmitter circuit, receiver circuit, clock data recovery phase locked loop circuit, data transfer method and data transfer system , 2005 .