A Lightweight Method to Evaluate Effect of Approximate Memory with Hardware Performance Monitors

The latency and the energy consumption of DRAM are serious concerns because (1) the latency has not improved much for decades and (2) recent machines have huge capacity of main memory. Device-level studies reduce them by shortening the wait time of DRAM internal operations so that they finish fast and consume less energy. Applying these techniques aggressively to achieve approximate memory is a promising direction to further reduce the overhead, given that many data-center applications today are to some extent robust to bit-flips. To advance research on approximate memory, it is required to evaluate its effect to applications so that both researchers and potential users of approximate memory can investigate how it affects realistic applications. However, hardware simulators are too slow to run workloads repeatedly with different parameters. To this end, we propose a lightweight method to evaluate effect of approximate memory. The idea is to count the number of DRAM internal operations that occur to approximate data of applications and calculate the probability of bit-flips based on it, instead of using heavy-weight simulators. The evaluation shows that our system is 3 orders of magnitude faster than cycle accurate simulators, and we also give case studies of evaluating effect of approximate memory to some realistic applications. key words: approximate memory, computer architecture, memory systems

[1]  Jie Liu,et al.  Characterizing Application Memory Error Vulnerability to Optimize Datacenter Cost via Heterogeneous-Reliability Memory , 2014, 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[2]  Jun Yang,et al.  Restore truncation for performance improvement in future DRAM systems , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[3]  Onur Mutlu,et al.  Understanding Latency Variation in Modern DRAM Chips: Experimental Characterization, Analysis, and Optimization , 2016, SIGMETRICS.

[4]  Dennis M. Wilkinson,et al.  Large-Scale Parallel Collaborative Filtering for the Netflix Prize , 2008, AAIM.

[5]  Bin Nie,et al.  Fault Site Pruning for Practical Reliability Analysis of GPGPU Applications , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[6]  Henrique S. Malvar,et al.  High-Density Image Storage Using Approximate Memory Cells , 2016, ASPLOS.

[7]  Hyeonggyu Kim,et al.  Partial Row Activation for Low-Power DRAM System , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[8]  Wongyu Shin,et al.  Multiple Clone Row DRAM: A low latency and area optimized DRAM , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[9]  Sung Woo Chung,et al.  Exploiting Refresh Effect of DRAM Read Operations: A Practical Approach to Low-Power Refresh , 2016, IEEE Transactions on Computers.

[10]  Gokcen Kestor,et al.  Understanding the propagation of transient errors in HPC applications , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[11]  Onur Mutlu,et al.  Solar-DRAM: Reducing DRAM Access Latency by Exploiting the Variation in Local Bitlines , 2018, 2018 IEEE 36th International Conference on Computer Design (ICCD).

[12]  Karthik Pattabiraman,et al.  Quantifying the Accuracy of High-Level Fault Injection Techniques for Hardware Faults , 2014, 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[13]  Martin Rinard,et al.  Using Code Perforation to Improve Performance, Reduce Energy Consumption, and Respond to Failures , 2009 .

[14]  Song Liu,et al.  Flikker: saving DRAM refresh-power through critical data partitioning , 2011, ASPLOS XVI.

[15]  Dong Chen,et al.  Write Locality and Optimization for Persistent Memory , 2016, MEMSYS.

[16]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[17]  Bruce Jacob,et al.  Memory Systems: Cache, DRAM, Disk , 2007 .

[18]  Hirotaka Ogawa,et al.  Performance Prediction of Memory Access Intensive Apps with Delay Insertion: A Vision , 2016, 2016 IEEE International Conference on Cloud Computing Technology and Science (CloudCom).

[19]  Luiz André Barroso,et al.  The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Second Edition , 2013, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Second Edition.

[20]  Thomas F. Wenisch,et al.  PowerNap: eliminating server idle power , 2009, ASPLOS.

[21]  David A. Patterson,et al.  Computer Architecture, Fifth Edition: A Quantitative Approach , 2011 .

[22]  Bo Fang,et al.  LetGo: A Lightweight Continuous Framework for HPC Applications Under Failures , 2017, HPDC.

[23]  Soramichi Akiyama,et al.  Reactive NaN Repair for Applying Approximate Memory to Numerical Applications , 2018, ArXiv.

[24]  Onur Mutlu,et al.  ChargeCache: Reducing DRAM latency by exploiting row access locality , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[25]  Pradip Bose,et al.  Understanding Error Propagation in GPGPU Applications , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[26]  Onur Mutlu,et al.  Reducing DRAM Latency via Charge-Level-Aware Look-Ahead Partial Restoration , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[27]  Henri Casanova,et al.  High-Bandwidth Low-Latency Approximate Interconnection Networks , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[28]  Dan Grossman,et al.  EnerJ: approximate data types for safe and general low-power computation , 2011, PLDI '11.

[29]  Martin C. Rinard,et al.  Verifying quantitative reliability for programs that execute on unreliable hardware , 2013, OOPSLA.