Exploiting Asymmetry in eDRAM Errors for Redundancy-Free Error-Tolerant Design

For some applications, errors have a different impact on data and memory systems depending on whether they change a zero to a one or the other way around; for an unsigned integer, a one to zero (or zero to one) error reduces (or increases) the value. For some memories, errors are also asymmetric; for example, in a DRAM, retention failures discharge the storage cell. The tolerance of such asymmetric errors would result in a robust and efficient system design. Error Control Codes (ECCs) are one common technique for memory protection against these errors by introducing some redundancy in memory cells. In this paper, the asymmetry in the errors in Embedded DRAMs (eDRAMs) is exploited for error-tolerant designs without using any ECC or parity, which are redundancy-free in terms of memory cells. A model for the impact of retention errors and refresh time of eDRAMs on the False Positive rate or False Negative rate of some eDRAM applications is proposed and analyzed. Bloom Filters (BFs) and read-only or write-through caches implemented in eDRAMs are considered as the first case studies for this model. For BFs, their tolerance to some zero to one errors (but not one to zero errors) is combined with the asymmetry of retention errors in eDRAMs to show that no ECC or parity is needed to protect the filter; moreover, the eDRAM refresh time can significantly be increased, thus reducing its power consumption. For caches, this paper shows that asymmetry in errors can be exploited also by using a redundancy-free error-tolerant scheme, which only introduces false negatives, but no false positives, therefore causing no data corruption. The proposed redundancy-free implementations have been compared with existing schemes for BFs and caches to show the benefits in terms of different figures of merit such as memory size, area, decoder/encoder complexity and delay. Finally, in the last case study, we show that the asymmetry of retention errors can be used to develop additional error correction capabilities in Modular Redundancy Schemes.

[1]  Wei Wu,et al.  Reducing cache power with low-cost, multi-bit error-correcting codes , 2010, ISCA.

[2]  R.C. Baumann,et al.  Radiation-induced soft errors in advanced semiconductor technologies , 2005, IEEE Transactions on Device and Materials Reliability.

[3]  Wei Kong,et al.  Analysis of Retention Time Distribution of Embedded DRAM - A New Method to Characterize Across-Chip Threshold Voltage Variation , 2008, 2008 IEEE International Test Conference.

[4]  Nobuyasu Kanekawa,et al.  Dependability in Electronic Systems , 2011 .

[5]  Bruce Jacob,et al.  Memory Systems: Cache, DRAM, Disk , 2007 .

[6]  Thomas de Quincey [C] , 2000, The Works of Thomas De Quincey, Vol. 1: Writings, 1799–1820.

[7]  Engin Ipek,et al.  Content Aware Refresh: Exploiting the Asymmetry of DRAM Retention Errors to Reduce the Refresh Frequency of Less Vulnerable Data , 2019, IEEE Transactions on Computers.

[8]  Sumio Matsuda,et al.  Analysis of single-ion multiple-bit upset in high-density DRAMs , 2000 .

[9]  B. Narasimham,et al.  A multi-bit error detection scheme for DRAM using partial sums with parallel counters , 2008, 2008 IEEE International Reliability Physics Symposium.

[10]  Marco Ottavi,et al.  A method to protect Bloom filters from soft errors , 2015, 2015 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFTS).

[11]  Pedro Reviriego,et al.  Evaluating Direct Compare for Double Error-Correction Codes , 2017, IEEE Transactions on Device and Materials Reliability.

[12]  Meng-Fan Chang,et al.  eTag: Tag-Comparison in Memory to Achieve Direct Data Access based on eDRAM to Improve Energy Efficiency of DRAM Cache , 2017, IEEE Transactions on Circuits and Systems I: Regular Papers.

[13]  C.W. Slayman,et al.  Cache and memory error detection, correction, and reduction techniques for terrestrial servers and workstations , 2005, IEEE Transactions on Device and Materials Reliability.

[14]  Norbert Wehn,et al.  Improving the error behavior of DRAM by exploiting its Z-channel property , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[15]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[16]  Andrei Broder,et al.  Network Applications of Bloom Filters: A Survey , 2004, Internet Math..

[17]  Mathias Beike,et al.  Digital Integrated Circuits A Design Perspective , 2016 .

[18]  Pedro Reviriego,et al.  Single Event Transient Tolerant Bloom Filter Implementations , 2017, IEEE Transactions on Computers.

[19]  L. Litwin,et al.  Error control coding , 2001 .

[20]  Li Fan,et al.  Summary cache: a scalable wide-area web cache sharing protocol , 2000, TNET.

[21]  Cheng-Chieh Huang,et al.  ATCache: Reducing DRAM cache latency via a small SRAM tag cache , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[22]  Andreas G. Veneris,et al.  L-CBF: A Low-Power, Fast Counting Bloom Filter Architecture , 2008, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[23]  R. Reed,et al.  Angular dependence of multiple-bit upsets induced by protons in a 16 mbit DRAM , 2004, IEEE Transactions on Nuclear Science.

[24]  Soontae Kim,et al.  SimTag: Exploiting tag bits similarity to improve the reliability of the data caches , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[25]  Michael Gschwind,et al.  IBM POWER8 processor core microarchitecture , 2015, IBM J. Res. Dev..

[26]  Salvatore Pontarelli,et al.  EMOMA: Exact Match in One Memory Access , 2017, IEEE Transactions on Knowledge and Data Engineering.

[27]  Pedro Reviriego,et al.  A Scheme to Improve the Intrinsic Error Detection of the Instruction Set Architecture , 2017, IEEE Computer Architecture Letters.

[28]  Isaac Keslassy,et al.  Maximizing the Throughput of Hash Tables in Network Devices with Combined SRAM/DRAM Memory , 2015, IEEE Transactions on Parallel and Distributed Systems.

[29]  Sanghyeon Baeg,et al.  SRAM Interleaving Distance Selection With a Soft Error Failure Model , 2009, IEEE Transactions on Nuclear Science.

[30]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[31]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[32]  P. Alam,et al.  R , 1823, The Herodotus Encyclopedia.

[33]  Yiannakis Sazeides,et al.  Don’t Correct the Tags in a Cache, Just Check Their Hamming Distance from the Lookup Tag , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[34]  Shubu Mukherjee,et al.  Architecture Design for Soft Errors , 2008 .

[35]  P. Alam ‘N’ , 2021, Composites Engineering: An A–Z Guide.

[36]  Fabrizio Lombardi,et al.  Non-Binary Orthogonal Latin Square Codes for a Multilevel Phase Charge Memory (PCM) , 2015, IEEE Transactions on Computers.

[37]  P. Alam ‘S’ , 2021, Composites Engineering: An A–Z Guide.

[38]  Shuai Wang,et al.  Replicating Tag Entries for Reliability Enhancement in Cache Tag Arrays , 2012, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[39]  Philip G. Emma,et al.  Rethinking Refresh: Increasing Availability and Reducing Power in DRAM for Cache Applications , 2008, IEEE Micro.

[40]  Miguel Jimeno,et al.  Two-tier Bloom filter to achieve faster membership testing , 2008 .

[41]  Norbert Wehn,et al.  Efficient coding scheme for DDR4 memory subsystems , 2018, MEMSYS.

[42]  Scott Hauck,et al.  K-Mer Counting Using Bloom Filters with an FPGA-Attached HMC , 2017, 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[43]  Amin Ansari,et al.  Mosaic: Exploiting the spatial locality of process variation to reduce refresh energy in on-chip eDRAM modules , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).