论文信息 - A Hybrid Buffer Design with STT-MRAM for On-Chip Interconnects

A Hybrid Buffer Design with STT-MRAM for On-Chip Interconnects

As the chip multiprocessor (CMP) design moves toward many-core architectures, communication delay in Network-on-Chip (NoC) has been a major bottleneck in CMP systems. Using high-density memories in input buffers helps to reduce the bottleneck through increasing throughput. Spin-Torque Transfer Magnetic RAM (STT-MRAM) can be a suitable solution due to its nature of high density and near-zero leakage power. But its long latency and high power consumption in write operations still need to be addressed. We explore the design issues in using STT-MRAM for NoC input buffers. Motivated by short intra-router latency, we use the previously proposed write latency reduction technique sacrificing retention time. Then we propose a hybrid design of input buffers using both SRAM and STT-MRAM to hide the long write latency efficiently. Considering that simple data migration in the hybrid buffer consumes more dynamic power compared to SRAM, we provide a lazy migration scheme that reduces the dynamic power consumption of the hybrid buffer. Simulation results show that the proposed scheme enhances the throughput by 21% on average.

[1] Fredrik Larsson,et al. Simics: A Full System Simulation Platform , 2002, Computer.

[2] Anoop Gupta,et al. The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[3] Mircea R. Stan,et al. Relaxing non-volatility for fast and energy-efficient STT-RAM caches , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[4] 裕幸飯田,et al. International Technology Roadmap for Semiconductors 2003の要求清浄度について－シリコンウエハ表面と雰囲気環境に要求される清浄度, 分析方法の現状について－ , 2004 .

[5] Onur Mutlu,et al. Architecting phase change memory as a scalable dram alternative , 2009, ISCA '09.

[6] Chita R. Das,et al. Architecting on-chip interconnects for stacked 3D STT-RAM caches in CMPs , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[7] Jun Yang,et al. Energy reduction for STT-RAM using early write termination , 2009, 2009 IEEE/ACM International Conference on Computer-Aided Design - Digest of Technical Papers.

[8] Akif Ali,et al. Near-optimal worst-case throughput routing for two-dimensional mesh networks , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[9] William J. Dally,et al. A delay model and speculative architecture for pipelined routers , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[10] Engin Ipek,et al. Resistive computation: avoiding the power wall with low-leakage, STT-MRAM based computing , 2010, ISCA.

[11] Luciano Lavagno,et al. High-Level Modeling and Design of Asynchronous Interface Logic , 1995, IEEE Des. Test Comput..

[12] Vijayalakshmi Srinivasan,et al. Scalable high performance main memory system using phase-change memory technology , 2009, ISCA '09.

[13] William J. Dally,et al. Flattened Butterfly Topology for On-Chip Networks , 2007, IEEE Comput. Archit. Lett..

[14] William J. Dally. Virtual-channel flow control , 1990, ISCA '90.

[15] Jun Yang,et al. Fine-grained QoS scheduling for PCM-based main memory systems , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[16] Andrew B. Kahng,et al. ORION 2.0: A fast and accurate NoC power and area model for early-stage design space exploration , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[17] Jun Yang,et al. A durable and energy efficient main memory using phase change memory technology , 2009, ISCA '09.

[18] Chita R. Das,et al. Cache revive: Architecting volatile STT-RAM caches for enhanced performance in CMPs , 2012, DAC Design Automation Conference 2012.

[19] Moinuddin K. Qureshi,et al. Improving read performance of Phase Change Memories via Write Cancellation and Write Pausing , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[20] William J. Dally,et al. Deadlock-Free Message Routing in Multiprocessor Interconnection Networks , 1987, IEEE Transactions on Computers.

[21] Yiran Chen,et al. A novel architecture of the 3D stacked MRAM L2 cache for CMPs , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.