A DWM-Based Stack Architecture Implementation for Energy Harvesting Systems

Energy harvesting systems tend to use non-volatile processors to conduct computation under intermittent power supplies. While previous implementations of non-volatile processors are based on register architectures, stack architecture, known for its simplicity and small footprint, seems to be a better fit for energy harvesting systems. In this work, Domain Wall Memory (DWM) is used to implement ZPU, the world’s smallest working CPU. Not only does DWM offer ultra-high density and SRAM-comparable access latency, but the sequential access structure of DWM also makes it well suited for a stack whose accesses display high temporal locality. As the performance and energy of DWM are determined by the number of shift operations performed to access the stack, this paper further reduces shift operations through novel data placement and micro-code transformation optimizations. The impact of compiler optimization techniques on the number of shift operations is also investigated so as to select the most effective optimizations for DWM-based stack machine. Experimental studies confirm the effectiveness of the proposed DWM-based stack architectures in improving the performance and energy-efficiency of energy harvesting systems.

[1]  Meng-Fan Chang,et al.  Ambient energy harvesting nonvolatile processors: From circuit to system , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[2]  Rami G. Melhem,et al.  FusedCache: A Naturally Inclusive, Racetrack Memory, Dual-Level Private Cache , 2016, IEEE Transactions on Multi-Scale Computing Systems.

[3]  Kaushik Roy,et al.  TapeCache: a high density, energy efficient cache based on domain wall memory , 2012, ISLPED '12.

[4]  Alireza Ejlali,et al.  Two-State Checkpointing for Energy-Efficient Fault Tolerance in Hard Real-Time Systems , 2016, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[5]  Seyedhamidreza Motaman,et al.  Domain Wall Memory-Layout, Circuit and Synergistic Systems , 2015, IEEE Transactions on Nanotechnology.

[6]  Tao Zhang,et al.  Using multi-level cell STT-RAM for fast and energy-efficient local checkpointing , 2014, 2014 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[7]  Jiwu Shu,et al.  Exploring main memory design based on racetrack memory technology , 2016, 2016 International Great Lakes Symposium on VLSI (GLSVLSI).

[8]  Kaushik Ravindran,et al.  BOOST: Berkeley’s Out-of-Order Stack Thingy , 2002 .

[9]  Jacques-Olivier Klein,et al.  Magnetic Adder Based on Racetrack Memory , 2013, IEEE Transactions on Circuits and Systems I: Regular Papers.

[10]  Rami G. Melhem,et al.  Domain-wall memory buffer for low-energy NoCs , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[11]  Philip Koopman,et al.  Modern Stack Computer Architecture , 1990 .

[12]  Edwin Hsing-Mean Sha,et al.  Optimizing data placement for reducing shift operations on Domain Wall Memories , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[13]  Wenqing Wu,et al.  Cross-layer racetrack memory design for ultra high density and low power consumption , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[14]  Rami G. Melhem,et al.  Multilane Racetrack caches: Improving efficiency through compression and independent shifting , 2015, The 20th Asia and South Pacific Design Automation Conference.

[15]  S. Parkin,et al.  Magnetic Domain-Wall Racetrack Memory , 2008, Science.

[16]  Yuan Xie,et al.  Leveraging 3D PCRAM technologies to reduce checkpoint overhead for future exascale systems , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[17]  Yiran Chen,et al.  Exploration of GPGPU register file architecture using domain-wall-shift-write based racetrack memory , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).

[18]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[19]  Narayanan Vijaykrishnan,et al.  Nonvolatile Processor Architecture Exploration for Energy-Harvesting Applications , 2015, IEEE Micro.

[20]  Yiran Chen,et al.  Area and performance co-optimization for domain wall memory in application-specific embedded systems , 2015, DAC.

[21]  Murali Annavaram,et al.  Parallel Computer Organization and Design , 2012 .

[22]  Chengmo Yang,et al.  Qualifying non-volatile register files for embedded systems through compiler-directed write minimization and balancing , 2015, 2015 IFIP/IEEE International Conference on Very Large Scale Integration (VLSI-SoC).

[23]  Guang R. Gao,et al.  Leveraging access port positions to accelerate page table walk in DWM-based main memory , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[24]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[25]  Kaushik Roy,et al.  DWM-TAPESTRI - An energy efficient all-spin cache using domain wall shift based writes , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[26]  Yiran Chen,et al.  Checkpoint-aware instruction scheduling for nonvolatile processor with multiple functional units , 2015, The 20th Asia and South Pacific Design Automation Conference.