A Locality-Aware Compression Scheme for Highly Reliable Embedded Systems

Dynamic random access memory (DRAM) reliability has become one of the critical issues in embedded systems, as DRAM process technology advances with the increase in bit error probability. Unfortunately, redundant error-correction code (ECC) chips cannot be applied to embedded systems since cores and DRAMs are tightly coupled without a dual in-line memory module (DIMM) slot to account for the form factor, cost, and limited pin count. Therefore, ECC parities are typically placed in the same physical array where the user and system data reside. This coexistence eventually deteriorates data locality, which could be the critical factor in DRAM performance degradation. To address this issue, we propose an ECC scheme called locality-aware compression (LoComp) which integrates a compression algorithm, DRAM data layout, and memory controller especially optimized for embedded systems. We focus on the locality of the dataset and its corresponding metadata, as well as spatial data locality in the design of DRAM data layout, which reduces the number of row activations. The major feature in a compression algorithm is adjusting the misalignment of data streams caused by the data packing in many embedded systems. Moreover, we specialize the memory controller to reduce DRAM access for ECC parities and compression flags. The core technologies for the memory controller are the adoption of a set of small caches for metadata and the support of partial write operation without changing the DRAM interface. LoComp+, an enhanced version of LoComp, further reduces DRAM access for metadata by placing the metadata close to the corresponding data. In the experiment, previous works increase the DRAM access time from 68% to over twice the value compared to ECC DIMM. Whereas, LoComp and LoComp+ show reduced performance degradation by 33% and 48%, respectively. In other words, LoComp and LoComp+ substantially improved performance from between 13% and 33% compared to previous embedded ECC schemes.

[1]  Chris Fallin,et al.  Flipping bits in memory without accessing them: An experimental study of DRAM disturbance errors , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[2]  Ronald G. Dreslinski,et al.  Full-system analysis and characterization of interactive smartphone applications , 2011, 2011 IEEE International Symposium on Workload Characterization (IISWC).

[3]  Eduardo Pinheiro,et al.  DRAM errors in the wild: a large-scale field study , 2009, SIGMETRICS '09.

[4]  Rajeev Balasubramonian,et al.  MemZip: Exploring unconventional benefits from memory compression , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[5]  Norman P. Jouppi,et al.  LOT-ECC: Localized and tiered reliability mechanisms for commodity memory systems , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[6]  David A. Wood,et al.  Adaptive cache compression for high-performance processors , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[7]  Tao Zhang,et al.  Half-DRAM: A high-bandwidth and low-power DRAM architecture from the rethinking of fine-grained activation , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[8]  Bruce Jacob,et al.  Memory Systems: Cache, DRAM, Disk , 2007 .

[9]  Bruce Jacob,et al.  DRAMSim2: A Cycle Accurate Memory System Simulator , 2011, IEEE Computer Architecture Letters.

[10]  David A. Wood,et al.  Frequent Pattern Compression: A Significance-Based Compression Scheme for L2 Caches , 2004 .

[11]  Norman P. Jouppi,et al.  Architecting Efficient Interconnects for Large Caches with CACTI 6.0 , 2008, IEEE Micro.

[12]  Long Chen,et al.  Free ECC: An efficient error protection for compressed last-level caches , 2013, 2013 IEEE 31st International Conference on Computer Design (ICCD).

[13]  Jeffrey S. Vetter,et al.  A Survey Of Architectural Approaches for Data Compression in Cache and Main Memory Systems , 2016 .

[14]  N. Ansari,et al.  Interleaving for combating bursts of errors , 2004, IEEE Circuits and Systems Magazine.

[15]  Mikko H. Lipasti,et al.  COP: To compress and protect main memory , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[16]  M. Gupta,et al.  Undetected error probability of hamming code for any number of symbols , 2010, 2010 IEEE International Conference on Information Theory and Information Security.

[17]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[18]  R. Govindarajan,et al.  Multiple sub-row buffers in DRAM: unlocking performance and energy improvement opportunities , 2012, ICS '12.

[19]  Timothy J. Dell,et al.  System RAS implications of DRAM soft errors , 2008, IBM J. Res. Dev..

[20]  Zhao Zhang,et al.  Mini-rank: Adaptive DRAM architecture for improving memory power efficiency , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[21]  Meng-Hee Teng Comments on "The Prime Memory Systems for Array Access" , 1983, IEEE Trans. Computers.

[22]  Won Woo Ro,et al.  Warped-Compression: Enabling power efficient GPUs through register compression , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[23]  Doe Hyun Yoon,et al.  Virtualized and flexible ECC for main memory , 2010, ASPLOS XV.

[24]  Piet Demeester,et al.  Mobile device power models for energy efficient dynamic offloading at runtime , 2016, J. Syst. Softw..

[25]  Dongwook Kim,et al.  Exploiting Compression-Induced Internal Fragmentation for Power-Off Recovery in SSD , 2016, IEEE Transactions on Computers.

[26]  Mattan Erez,et al.  Frugal ECC: efficient and versatile memory error protection through fine-grained compression , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[27]  Marcelo Yuffe,et al.  A fully integrated multi-CPU, GPU and memory controller 32nm processor , 2011, 2011 IEEE International Solid-State Circuits Conference.

[28]  Q. S. Gao The Chinese Remainder Theorem And The Prime Memory System , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[29]  Onur Mutlu,et al.  Base-delta-immediate compression: Practical data compression for on-chip caches , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[30]  Christian Bienia,et al.  Benchmarking modern multiprocessors , 2011 .

[31]  Long Chen,et al.  E3CC: A memory error protection scheme with novel address mapping for subranked and low-power memories , 2013, ACM Trans. Archit. Code Optim..

[32]  Youngjae Kim,et al.  DFTL: a flash translation layer employing demand-based selective caching of page-level address mappings , 2009, ASPLOS.