Racetrack Queues for Extremely Low-Energy FIFOs

Networks-on-chip (NoCs) have become a leading energy consumer in modern multicore processors, with a considerable portion of this energy originating from the large number of virtual channel first-in–first-out (FIFO) buffers. Given this motivation, we propose control schemes that leverage the “shift-register” nature of spintronic domain-wall memory (DWM) to create extremely low-energy FIFO queues. In order to test these queues in the most relevant application context, replacing conventional memory buffers for NoCs, we perform design-space analysis over the different schemes in a network context and then analyze the best schemes with benchmark traffic. Our results indicate that the best shift-based buffer utilizes a dual-nanowire approach to ensure that reads and writes can be more effectively aligned with access ports for simultaneous access in the same cycle. Our approach provides a $2.93\times $ speedup over a DWM buffer using a traditional FIFO memory control scheme with a 23.4% savings in energy. The resulting approach achieves a 39% reduction in energy-delay product compared to SRAM and a 24% reduction in energy-delay product versus spin-transfer torque magnetic memories.

[1]  Lieven Eeckhout,et al.  Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[2]  Kaushik Roy,et al.  DWM-TAPESTRI - An energy efficient all-spin cache using domain wall shift based writes , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[3]  Cong Xu,et al.  NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory , 2012, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[4]  Mircea R. Stan,et al.  Relaxing non-volatility for fast and energy-efficient STT-RAM caches , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[5]  Paul D. Franzon,et al.  FreePDK: An Open-Source Variation-Aware Design Kit , 2007, 2007 IEEE International Conference on Microelectronic Systems Education (MSE'07).

[6]  Kuei-Hung Shen,et al.  Racetrack Memory: A high-performance, low-cost, non-volatile memory based on magnetic domain walls , 2011, 2011 International Electron Devices Meeting.

[7]  Srinivas Devadas,et al.  Scalable, accurate multicore simulation in the 1000-core era , 2011, (IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE.

[8]  Kaushik Roy,et al.  TapeCache: a high density, energy efficient cache based on domain wall memory , 2012, ISLPED '12.

[9]  Alex K. Jones,et al.  Design exploration of racetrack lower-level caches , 2014, 2014 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[10]  Wenqing Wu,et al.  Multi retention level STT-RAM cache designs with a dynamic refresh scheme , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[11]  Axel Jantsch,et al.  Buffer Optimization in Network-on-Chip Through Flow Regulation , 2010, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[12]  S. Parkin Racetrack memory: A storage class memory based on current controlled magnetic domain wall motion , 2009, 2009 Device Research Conference.

[13]  Yue Zhang,et al.  Ultra-High Density Content Addressable Memory Based on Current Induced Domain Wall Motion in Magnetic Track , 2012, IEEE Transactions on Magnetics.

[14]  Rami G. Melhem,et al.  Multilane Racetrack caches: Improving efficiency through compression and independent shifting , 2015, The 20th Asia and South Pacific Design Automation Conference.

[15]  T. Endoh,et al.  A content addressable memory using magnetic domain wall motion cells , 2011, 2011 Symposium on VLSI Circuits - Digest of Technical Papers.

[16]  S. Parkin,et al.  Magnetic Domain-Wall Racetrack Memory , 2008, Science.

[17]  Onur Mutlu,et al.  A case for bufferless routing in on-chip networks , 2009, ISCA '09.

[18]  Weisheng Zhao,et al.  Domain Wall Shift Register-Based Reconfigurable Logic , 2011, IEEE Transactions on Magnetics.

[19]  Yiran Chen,et al.  Exploration of GPGPU register file architecture using domain-wall-shift-write based racetrack memory , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).

[20]  Kaushik Roy,et al.  STAG: Spintronic-Tape Architecture for GPGPU cache hierarchies , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[21]  Jacques-Olivier Klein,et al.  Racetrack memory based reconfigurable computing , 2013, 2013 IEEE Faible Tension Faible Consommation.

[22]  P. Chevalier,et al.  Racetrack memory cell array with integrated magnetic tunnel junction readout , 2011, 2011 International Electron Devices Meeting.

[23]  Kaushik Roy,et al.  Energy efficient many-core processor for recognition and mining using spin-based memory , 2011, 2011 IEEE/ACM International Symposium on Nanoscale Architectures.

[24]  Kai Li,et al.  PARSEC vs. SPLASH-2: A quantitative comparison of two multithreaded benchmark suites on Chip-Multiprocessors , 2008, 2008 IEEE International Symposium on Workload Characterization.

[25]  Ki Hwan Yum,et al.  A Hybrid Buffer Design with STT-MRAM for On-Chip Interconnects , 2012, 2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip.

[26]  Weisheng Zhao,et al.  Perpendicular-magnetic-anisotropy CoFeB racetrack memory , 2012 .

[27]  Wenqing Wu,et al.  Cross-layer racetrack memory design for ultra high density and low power consumption , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[28]  T. Trypiniotis,et al.  Magnetic domain-wall racetrack memory for high density and fast data storage , 2012, 2012 IEEE 11th International Conference on Solid-State and Integrated Circuit Technology.