Domain-wall memory buffer for low-energy NoCs

Networks-on-chip (NoCs) have become a leading energy consumer in modern multi-core processors, with a considerable portion of this energy originating from the large number of virtual channel (FIFO) buffers. While emerging memories have been considered for many architectural components such as caches, the asymmetric access properties and relatively small size of network-FIFOs compared to the required peripheral circuitry has led to few such replacements proposed for NoCs. In this paper, we propose control schemes that leverage the “shift-register” nature of spintronic domain-wall memory (DWM) to replace conventional memory buffers for the NoC. Our results indicate that the best shift-based scheme utilizes a dual-nanowire approach to ensure that reads and writes can be more effectively aligned with access ports for simultaneous access in the same cycle. Our approach provides a 2.93X speedup over a DWM buffer using a traditional FIFO memory control scheme with a 1.16X savings in energy. Compared to a SRAM-FIFO it exhibits an 8% message latency degradation versus a 56% energy reduction. The resulting approach achieves a 53% reduction in energy delay product compared to SRAM and a 42% reduction in energy delay product versus STT-MRAM.

[1]  S. Parkin,et al.  Magnetic Domain-Wall Racetrack Memory , 2008, Science.

[2]  Jacques-Olivier Klein,et al.  Racetrack memory based reconfigurable computing , 2013, 2013 IEEE Faible Tension Faible Consommation.

[3]  P. Chevalier,et al.  Racetrack memory cell array with integrated magnetic tunnel junction readout , 2011, 2011 International Electron Devices Meeting.

[4]  Onur Mutlu,et al.  A case for bufferless routing in on-chip networks , 2009, ISCA '09.

[5]  Yiran Chen,et al.  Exploration of GPGPU register file architecture using domain-wall-shift-write based racetrack memory , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).

[6]  Kaushik Roy,et al.  STAG: Spintronic-Tape Architecture for GPGPU cache hierarchies , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[7]  Kaushik Roy,et al.  DWM-TAPESTRI - An energy efficient all-spin cache using domain wall shift based writes , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[8]  Kai Li,et al.  PARSEC vs. SPLASH-2: A quantitative comparison of two multithreaded benchmark suites on Chip-Multiprocessors , 2008, 2008 IEEE International Symposium on Workload Characterization.

[9]  Kuei-Hung Shen,et al.  Racetrack Memory: A high-performance, low-cost, non-volatile memory based on magnetic domain walls , 2011, 2011 International Electron Devices Meeting.

[10]  Cong Xu,et al.  NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory , 2012, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[11]  T. Trypiniotis,et al.  Magnetic domain-wall racetrack memory for high density and fast data storage , 2012, 2012 IEEE 11th International Conference on Solid-State and Integrated Circuit Technology.

[12]  Axel Jantsch,et al.  Buffer Optimization in Network-on-Chip Through Flow Regulation , 2010, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[13]  Alex K. Jones,et al.  Design exploration of racetrack lower-level caches , 2014, 2014 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[14]  S. Parkin Racetrack memory: A storage class memory based on current controlled magnetic domain wall motion , 2009, 2009 Device Research Conference.

[15]  Wenqing Wu,et al.  Multi retention level STT-RAM cache designs with a dynamic refresh scheme , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[16]  Ki Hwan Yum,et al.  A Hybrid Buffer Design with STT-MRAM for On-Chip Interconnects , 2012, 2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip.

[17]  Kaushik Roy,et al.  TapeCache: a high density, energy efficient cache based on domain wall memory , 2012, ISLPED '12.

[18]  Srinivas Devadas,et al.  Scalable, accurate multicore simulation in the 1000-core era , 2011, (IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE.

[19]  Yue Zhang,et al.  Ultra-High Density Content Addressable Memory Based on Current Induced Domain Wall Motion in Magnetic Track , 2012, IEEE Transactions on Magnetics.

[20]  Rami G. Melhem,et al.  Multilane Racetrack caches: Improving efficiency through compression and independent shifting , 2015, The 20th Asia and South Pacific Design Automation Conference.

[21]  T. Endoh,et al.  A content addressable memory using magnetic domain wall motion cells , 2011, 2011 Symposium on VLSI Circuits - Digest of Technical Papers.

[22]  Lieven Eeckhout,et al.  Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[23]  Mircea R. Stan,et al.  Relaxing non-volatility for fast and energy-efficient STT-RAM caches , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[24]  Weisheng Zhao,et al.  Perpendicular-magnetic-anisotropy CoFeB racetrack memory , 2012 .