Area and energy-efficient buffer designs for NoC based on domain-wall memory

Networks-on-chip (NoC) is a major contributor to the power consumption in modern many-core processors, especially the router comprising large number of virtual channel (VC) first-in–first-out (FIFO) buffers. In this paper, we propose three buffer designs that leverage the unique serial access mechanism, non-volatility and high density of Domain-Wall Memory (DWM) to replace conventional SRAM based buffers in NoC router. Experiments demonstrates that the proposed DWM designs can achieve considerable improvement in area and power efficiency. The best performing proposed approach shows 36.1% (24.2%) area and 55.1% (24.5%) power saving over conventional SRAM (STT-MRAM) based designs respectively without performance degradation. key words: Networks-on-chip (NoC), Router, Domain-Wall Memory, Buffer Classification: Integrated circuits (memory)

[1]  Yuan Xie,et al.  Hybrid Drowsy SRAM and STT-RAM Buffer Designs for Dark-Silicon-Aware NoC , 2016, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[2]  Stephen W. Keckler,et al.  Netrace: dependency-driven trace-based network-on-chip simulation , 2010, NoCArc '10.

[3]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[4]  Chrysostomos Nicopoulos,et al.  ElastiStore: Flexible Elastic Buffering for Virtual-Channel-Based Networks on Chip , 2015, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[5]  Radu Marculescu,et al.  System-Level Buffer Allocation for Application-Specific Networks-on-Chip Router Design , 2006, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[6]  Dong Li,et al.  A Survey Of Architectural Approaches for Managing Embedded DRAM and Non-Volatile On-Chip Caches , 2015, IEEE Transactions on Parallel and Distributed Systems.

[7]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[8]  Kuei-Hung Shen,et al.  Racetrack Memory: A high-performance, low-cost, non-volatile memory based on magnetic domain walls , 2011, 2011 International Electron Devices Meeting.

[9]  S. Parkin,et al.  Magnetic Domain-Wall Racetrack Memory , 2008, Science.

[10]  Yiran Chen,et al.  An Energy-Efficient GPGPU Register File Architecture Using Racetrack Memory , 2017, IEEE Transactions on Computers.

[11]  Doug Burger,et al.  Implementation and Evaluation of On-Chip Network Architectures , 2006, 2006 International Conference on Computer Design.

[12]  Nan Jiang,et al.  A detailed and flexible cycle-accurate Network-on-Chip simulator , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[13]  Hai Li,et al.  Quantitative modeling of racetrack memory, a tradeoff among area, performance, and power , 2015, The 20th Asia and South Pacific Design Automation Conference.

[14]  Cong Xu,et al.  NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory , 2012, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[15]  Stuart Parkin,et al.  Memory on the racetrack. , 2015, Nature nanotechnology.

[16]  Kaushik Roy,et al.  DWM-TAPESTRI - An energy efficient all-spin cache using domain wall shift based writes , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[17]  Pu Li,et al.  On the Capacity of Bufferless Networks-on-Chip , 2015, IEEE Transactions on Parallel and Distributed Systems.

[18]  Haifeng Xu,et al.  Racetrack Queues for Extremely Low-Energy FIFOs , 2018, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[19]  Kaushik Roy,et al.  Cache Design with Domain Wall Memory , 2016, IEEE Transactions on Computers.

[20]  Kaushik Roy,et al.  DyReCTape: A dynamically reconfigurable cache using domain wall memory tapes , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[21]  Ronald G. Dreslinski,et al.  The M5 Simulator: Modeling Networked Systems , 2006, IEEE Micro.

[22]  Jian-Ping Wang,et al.  Programmable spintronics logic device based on a magnetic tunnel junction element , 2005 .

[23]  Ehsan Atoofian,et al.  Shift-aware racetrack memory , 2015, 2015 33rd IEEE International Conference on Computer Design (ICCD).

[24]  Swaroop Ghosh,et al.  Exploiting Serial Access and Asymmetric Read/Write of Domain Wall Memory for Area and Energy-Efficient Digital Signal Processor Design , 2016, IEEE Transactions on Circuits and Systems I: Regular Papers.

[25]  H. Jonathan Chao,et al.  Design of a Bufferless Photonic Clos Network-on-Chip Architecture , 2014, IEEE Transactions on Computers.

[26]  Tosiron Adegbija,et al.  HALLS: An Energy-Efficient Highly Adaptable Last Level STT-RAM Cache for Multicore Systems , 2019, IEEE Transactions on Computers.

[27]  E. Morifuji,et al.  Supply and threshold-Voltage trends for scaled logic and SRAM MOSFETs , 2006, IEEE Transactions on Electron Devices.

[28]  Ki Hwan Yum,et al.  A Hybrid Buffer Design with STT-MRAM for On-Chip Interconnects , 2012, 2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip.

[29]  Eitan Yaakobi,et al.  Coding for racetrack memories , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).