Compile-Time Silent-Store Elimination for Energy Efficiency: an Analytic Evaluation for Non-Volatile Cache Memory

Energy-efficiency has become very critical in modern high-performance and embedded systems. In on-chip systems, memory consumes an important part of energy. Emerging non-volatile memory (NVM) technologies, such as Spin-Transfer Torque RAM (STT-RAM), offer power saving opportunities, while they suffer from high write latency. In this paper, we propose a fast evaluation of NVM integration at cache level, together with a compile-time approach for mitigating the penalty incurred by the high write latency of STT-RAM. We implement a code optimization in LLVM for reducing so-called silent stores, i.e., store instruction instances that write to memory values that were already present there. This makes our optimization portable over any architecture supporting LLVM. Then, we assess the possible benefit of such an optimization on the Rodinia benchmark suite through an analytic approach based on parameters extracted from the literature devoted to NVMs. This makes it possible to rapidly analyze the impact of NVMs on memory energy consumption. Reported results show up to 42% energy gain when considering STT-RAM caches.

[1]  Wei-Kai Cheng,et al.  Architecture and data migration methodology for L1 cache design with hybrid SRAM and volatile STT-RAM configuration , 2016, Microprocess. Microsystems.

[2]  Kevin Skadron,et al.  A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads , 2010, IEEE International Symposium on Workload Characterization (IISWC'10).

[3]  Jianhua Li,et al.  Code Motion for Migration Minimization in STT-RAM Based Hybrid Cache , 2012, 2012 IEEE Computer Society Annual Symposium on VLSI.

[4]  H. Noguchi,et al.  A 4ns, 0.9V write voltage embedded perpendicular STT-MRAM fabricated by MTJ-Last process , 2014, Proceedings of Technical Program - 2014 International Symposium on VLSI Technology, Systems and Application (VLSI-TSA).

[5]  Dean M. Tullsen,et al.  Software data-triggered threads , 2012, OOPSLA '12.

[6]  Mikko H. Lipasti,et al.  Characterization of silent stores , 2000, Proceedings 2000 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00622).

[7]  Yi He,et al.  Reducing write activities on non-volatile memories in embedded CMPs via data migration and recomputation , 2010, Design Automation Conference.

[8]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[9]  Dean M. Tullsen,et al.  Data-triggered threads: Eliminating redundant computation , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[10]  Cong Xu,et al.  NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory , 2012, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[11]  Satoshi Takaya,et al.  7.5 A 3.3ns-access-time 71.2μW/MHz 1Mb embedded STT-MRAM using physically eliminated read-disturb scheme and normally-off memory architecture , 2015, 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers.

[12]  Jianhua Li,et al.  STT-RAM based energy-efficiency hybrid cache for CMPs , 2011, 2011 IEEE/IFIP 19th International Conference on VLSI and System-on-Chip.

[13]  Jun Yang,et al.  Energy reduction for STT-RAM using early write termination , 2009, 2009 IEEE/ACM International Conference on Computer-Aided Design - Digest of Technical Papers.

[14]  Jianhua Li,et al.  MAC: migration-aware compilation for STT-RAM based hybrid cache in embedded systems , 2012, ISLPED '12.

[15]  Shasha Wen,et al.  REDSPY: Exploring Value Locality in Software , 2017, ASPLOS.

[16]  Xiang Pan,et al.  NVSleep: Using non-volatile memory to enable fast sleep/wakeup of idle cores , 2014, 2014 IEEE 32nd International Conference on Computer Design (ICCD).

[17]  Dean M. Tullsen,et al.  CDTT: Compiler-generated data-triggered threads , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[18]  Sophiane Senni,et al.  MAGPIE: System-level Evaluation of Manycore Systems with Emerging Memory Technologies , 2017 .

[19]  Mikko H. Lipasti,et al.  Silent Stores and Store Value Locality , 2001, IEEE Trans. Computers.

[20]  Jianhua Li,et al.  Compiler-Assisted STT-RAM-Based Hybrid Cache for Energy Efficient Embedded Systems , 2014, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[21]  Luca Benini,et al.  System-level power optimization: techniques and tools , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[22]  Jeffrey S. Vetter,et al.  A Survey of Software Techniques for Using Non-Volatile Memories for Storage and Main Memory Systems , 2016, IEEE Transactions on Parallel and Distributed Systems.

[23]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[24]  Yiran Chen,et al.  A novel architecture of the 3D stacked MRAM L2 cache for CMPs , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[25]  Edwin Hsing-Mean Sha,et al.  MGC: Multiple graph-coloring for non-volatile memory based hybrid Scratchpad Memory , 2012, 2012 16th Workshop on Interaction between Compilers and Computer Architectures (INTERACT).

[26]  Xiaoxia Wu,et al.  Power and performance of read-write aware Hybrid Caches with non-volatile memories , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[27]  Mark Horowitz,et al.  Energy dissipation in general purpose microprocessors , 1996, IEEE J. Solid State Circuits.

[28]  Erik Brockmeyer,et al.  Data Memory Organization and Optimizations in Application-Specific Systems , 2001, IEEE Des. Test Comput..

[29]  Chita R. Das,et al.  Cache revive: Architecting volatile STT-RAM caches for enhanced performance in CMPs , 2012, DAC Design Automation Conference 2012.

[30]  Mircea R. Stan,et al.  Relaxing non-volatility for fast and energy-efficient STT-RAM caches , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[31]  Mikko H. Lipasti,et al.  On the value locality of store instructions , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).