A Hybrid Buffer Design with STT-MRAM for On-Chip Interconnects

As the chip multiprocessor (CMP) design moves toward many-core architectures, communication delay in Network-on-Chip (NoC) has been a major bottleneck in CMP systems. Using high-density memories in input buffers helps to reduce the bottleneck through increasing throughput. Spin-Torque Transfer Magnetic RAM (STT-MRAM) can be a suitable solution due to its nature of high density and near-zero leakage power. But its long latency and high power consumption in write operations still need to be addressed. We explore the design issues in using STT-MRAM for NoC input buffers. Motivated by short intra-router latency, we use the previously proposed write latency reduction technique sacrificing retention time. Then we propose a hybrid design of input buffers using both SRAM and STT-MRAM to hide the long write latency efficiently. Considering that simple data migration in the hybrid buffer consumes more dynamic power compared to SRAM, we provide a lazy migration scheme that reduces the dynamic power consumption of the hybrid buffer. Simulation results show that the proposed scheme enhances the throughput by 21% on average.

[1]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[2]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[3]  Mircea R. Stan,et al.  Relaxing non-volatility for fast and energy-efficient STT-RAM caches , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[4]  裕幸 飯田,et al.  International Technology Roadmap for Semiconductors 2003の要求清浄度について - シリコンウエハ表面と雰囲気環境に要求される清浄度, 分析方法の現状について - , 2004 .

[5]  Onur Mutlu,et al.  Architecting phase change memory as a scalable dram alternative , 2009, ISCA '09.

[6]  Chita R. Das,et al.  Architecting on-chip interconnects for stacked 3D STT-RAM caches in CMPs , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[7]  Jun Yang,et al.  Energy reduction for STT-RAM using early write termination , 2009, 2009 IEEE/ACM International Conference on Computer-Aided Design - Digest of Technical Papers.

[8]  Akif Ali,et al.  Near-optimal worst-case throughput routing for two-dimensional mesh networks , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[9]  William J. Dally,et al.  A delay model and speculative architecture for pipelined routers , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[10]  Engin Ipek,et al.  Resistive computation: avoiding the power wall with low-leakage, STT-MRAM based computing , 2010, ISCA.

[11]  Luciano Lavagno,et al.  High-Level Modeling and Design of Asynchronous Interface Logic , 1995, IEEE Des. Test Comput..

[12]  Vijayalakshmi Srinivasan,et al.  Scalable high performance main memory system using phase-change memory technology , 2009, ISCA '09.

[13]  William J. Dally,et al.  Flattened Butterfly Topology for On-Chip Networks , 2007, IEEE Comput. Archit. Lett..

[14]  William J. Dally Virtual-channel flow control , 1990, ISCA '90.

[15]  Jun Yang,et al.  Fine-grained QoS scheduling for PCM-based main memory systems , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[16]  Andrew B. Kahng,et al.  ORION 2.0: A fast and accurate NoC power and area model for early-stage design space exploration , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[17]  Jun Yang,et al.  A durable and energy efficient main memory using phase change memory technology , 2009, ISCA '09.

[18]  Chita R. Das,et al.  Cache revive: Architecting volatile STT-RAM caches for enhanced performance in CMPs , 2012, DAC Design Automation Conference 2012.

[19]  Moinuddin K. Qureshi,et al.  Improving read performance of Phase Change Memories via Write Cancellation and Write Pausing , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[20]  William J. Dally,et al.  Deadlock-Free Message Routing in Multiprocessor Interconnection Networks , 1987, IEEE Transactions on Computers.

[21]  Yiran Chen,et al.  A novel architecture of the 3D stacked MRAM L2 cache for CMPs , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.