Longevity of Commodity DRAMs in Harsh Environments Through Thermoelectric Cooling

Today, more and more commodity hardware devices are used in safety-critical applications, such as advanced driver assistance systems in automotive. These applications demand very high reliability of electronic components even in adverse environmental conditions, such as high temperatures. Ensuring the reliability of microelectronic components is a major challenge at these high temperatures. The computing systems of these applications rely on DRAMs as working memory, which are built upon bit cells that store charges in capacitors. These commodity DRAMs are optimized for cost per bit and not for high reliability. Thus, very high temperatures impose an enormous challenge for commodity DRAMs as the data retention time and reliability decrease largely, affecting the data correctness. Data correctness can be ensured up to certain temperatures by increasing the refresh rate to counterbalance the retention time reduction. However, this severely degrades the access latencies and the usable DRAM bandwidth. To overcome these limitations, we present for the first time a Thermoelectric Cooling (TEC) solution for commodity DRAMs in harsh-environments, such as automotive. Our TEC solution enables the use of commodity off-the-shelf DRAMs in safety-critical applications by reducing the temperature conditions to a range where they can operate reliably. This TEC solution is applied a posteriori to the DRAM chips without using high-cost package solutions. Thus, it maintains the low-cost targets of such devices, improves the reliability, and at the same time, counterbalances the adverse effects of increasing the refresh rate. To quantitatively evaluate the benefits of TEC on commodity DRAMs in harsh-environments, we performed system-level evaluations with several applications backed up by the measured data on commodity DRAMs. Our experimental results, using accurate multi-physics simulations that employ finite element method, demonstrate that the TEC-based cooling ensures that the maximum temperature of all DRAM chips is always below 85°C despite that the original on-chip temperature (i.e., in the absence of our TEC based cooling) goes beyond 120°C.

[1]  Chia-Lin Yang,et al.  SECRET: Selective error correction for refresh energy reduction in DRAMs , 2012, 2012 IEEE 30th International Conference on Computer Design (ICCD).

[2]  Kinam Kim,et al.  A New Investigation of Data Retention Time in Truly Nanoscaled DRAMs , 2009, IEEE Electron Device Letters.

[3]  Jörg Henkel,et al.  On-Demand Mobile CPU Cooling With Thin-Film Thermoelectric Array , 2021, IEEE Micro.

[4]  Richard Veras,et al.  RAIDR: Retention-aware intelligent DRAM refresh , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[5]  Norbert Wehn,et al.  An analysis on retention error behavior and power consumption of recent DDR4 DRAMs , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[6]  Sorin Grigorescu,et al.  A Survey of Deep Learning Techniques for Autonomous Driving , 2020, J. Field Robotics.

[7]  Kees G. W. Goossens,et al.  Improved Power Modeling of DDR SDRAMs , 2011, 2011 14th Euromicro Conference on Digital System Design.

[8]  Onur Mutlu,et al.  The reach profiler (REAPER): Enabling the mitigation of DRAM retention failures via profiling at aggressive conditions , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[9]  Norbert Wehn,et al.  Efficient coding scheme for DDR4 memory subsystems , 2018, MEMSYS.

[10]  J. Lucas,et al.  Sparkk : Quality-Scalable Approximate Storage in DRAM , 2014 .

[11]  Onur Mutlu,et al.  An experimental study of data retention behavior in modern DRAM devices: implications for retention time profiling mechanisms , 2013, ISCA.

[12]  Sherief Reda,et al.  Fast thermal modeling of liquid, thermoelectric, and hybrid cooling , 2017, 2017 16th IEEE Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronic Systems (ITherm).

[13]  Norbert Wehn,et al.  Retention time measurements and modelling of bit error rates of WIDE I/O DRAM in MPSoCs , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[14]  Iraklis Anagnostopoulos,et al.  NPU Thermal Management , 2020, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[15]  Kang G. Shin,et al.  Efficient thermoelectric cooling for mobile devices , 2017, 2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[16]  Norbert Wehn,et al.  A Platform to Analyze DDR3 DRAM’s Power and Retention Time , 2017, IEEE Design & Test.

[17]  Norbert Wehn,et al.  TLM modelling of 3D stacked wide I/O DRAM subsystems: a virtual platform for memory controller design space exploration , 2013, RAPIDO '13.

[18]  Naoto Horiguchi,et al.  80 nm tall thermally stable cost effective FinFETs for advanced dynamic random access memory periphery devices for artificial intelligence/machine learning and automotive applications , 2021 .

[19]  Hiroyuki Tomiyama,et al.  Proposal and Quantitative Analysis of the CHStone Benchmark Program Suite for Practical C-based High-level Synthesis , 2009, J. Inf. Process..

[20]  Todd M. Austin,et al.  SimpleScalar: An Infrastructure for Computer System Modeling , 2002, Computer.

[21]  Massoud Pedram,et al.  Platform-dependent, leakage-aware control of the driving current of embedded thermoelectric coolers , 2013, International Symposium on Low Power Electronics and Design (ISLPED).

[22]  Norbert Wehn,et al.  A Bank-Wise DRAM Power Model for System Simulations , 2017, RAPIDO.

[23]  Tao Li,et al.  Exploiting Dynamic Thermal Energy Harvesting for Reusing in Smartphone with Mobile Applications , 2018, ASPLOS.

[24]  Norbert Wehn,et al.  DRAMSys: A Flexible DRAM Subsystem Design Space Exploration Framework , 2015, IPSJ Trans. Syst. LSI Des. Methodol..

[25]  Norbert Wehn,et al.  Driving into the memory wall: the role of memory for advanced driver assistance systems and autonomous driving , 2018, MEMSYS.

[26]  Arun Kumar,et al.  Die-to-Die Testing and ECC Error Mitigation in Automotive and Industrial Safety Applications , 2020, 2020 IEEE International Test Conference (ITC).

[27]  Bruce Jacob,et al.  DRAM Refresh Mechanisms, Penalties, and Trade-Offs , 2016, IEEE Transactions on Computers.

[28]  Dae-Hyun Kim,et al.  ArchShield: architectural framework for assisting DRAM scaling by tolerating high error rates , 2013, ISCA.