Resistive computation: avoiding the power wall with low-leakage, STT-MRAM based computing

As CMOS scales beyond the 45nm technology node, leakage concerns are starting to limit microprocessor performance growth. To keep dynamic power constant across process generations, traditional MOSFET scaling theory prescribes reducing supply and threshold voltages in proportion to device dimensions, a practice that induces an exponential increase in subthreshold leakage. As a result, leakage power has become comparable to dynamic power in current-generation processes, and will soon exceed it in magnitude if voltages are scaled down any further. Beyond this inflection point, multicore processors will not be able to afford keeping more than a small fraction of all cores active at any given moment. Multicore scaling will soon hit a power wall. This paper presents resistive computation, a new technique that aims at avoiding the power wall by migrating most of the functionality of a modern microprocessor from CMOS to spin-torque transfer magnetoresistive RAM (STT-MRAM)---a CMOS-compatible, leakage-resistant, non-volatile resistive memory technology. By implementing much of the on-chip storage and combinational logic using leakage-resistant, scalable RAM blocks and lookup tables, and by carefully re-architecting the pipeline, an STT-MRAM based implementation of an eight-core Sun Niagara-like CMT processor reduces chip-wide power dissipation by 1.7× and leakage power by 2.1× at the 32nm technology node, while maintaining 93% of the system throughput of a CMOS-based design.

[1]  Stefan Rusu,et al.  A 45nm 8-core enterprise Xeon ® processor , 2009 .

[2]  Kunle Olukotun,et al.  Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.

[3]  D. Burger,et al.  Memory Bandwidth Limitations of Future Microprocessors , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[4]  Yu Cao,et al.  New generation of predictive technology model for sub-45nm design exploration , 2006, 7th International Symposium on Quality Electronic Design (ISQED'06).

[5]  Daisuke Suzuki,et al.  Fabrication of a nonvolatile lookup-table circuit chip using magneto/semiconductor-hybrid structure for an immediate-power-up field programmable gate array , 2009, 2009 Symposium on VLSI Circuits.

[6]  Josep Torrellas,et al.  The BubbleWrap many-core: Popping cores for sequential acceleration , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[7]  Onur Mutlu,et al.  Architecting phase change memory as a scalable dram alternative , 2009, ISCA '09.

[8]  Edvin Catovic GRFPU – High Performance IEEE-754 Floating-Point Unit , 2022 .

[9]  Shoji Ikeda,et al.  2Mb Spin-Transfer Torque RAM (SPRAM) with Bit-by-Bit Bidirectional Current Write and Parallelizing-Direction Current Read , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[10]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[11]  Stratix vs . Virtex-II Pro FPGA Performance Analysis , 2004 .

[12]  Gerry Kane,et al.  MIPS RISC Architecture , 1987 .

[13]  Yiran Chen,et al.  Spin-transfer torque magnetoresistive content addressable memory (CAM) cell structure design with enhanced search noise margin , 2008, 2008 IEEE International Symposium on Circuits and Systems.

[14]  Doug Burger,et al.  On-chip MRAM as a High-Bandwidth, Low-Latency Replacement for DRAM Physical Memories , 2004 .

[15]  M.A. Horowitz,et al.  Speed and power scaling of SRAM's , 2000, IEEE Journal of Solid-State Circuits.

[16]  S. Takahashi,et al.  Lower-current and fast switching of a perpendicular TMR for high speed and high density spin-transfer-torque MRAM , 2008, 2008 IEEE International Electron Devices Meeting.

[17]  H. Ohno,et al.  Fabrication of a Nonvolatile Full Adder Based on Logic-in-Memory Architecture Using Magnetic Tunnel Junctions , 2008 .

[18]  M. Kund,et al.  A Perpendicular Spin Torque Switching based MRAM for the 28 nm Technology Node , 2007, 2007 IEEE International Electron Devices Meeting.

[19]  B. Bloechel,et al.  A 4-GHz 300-mW 64-bit integer execution ALU with dual supply voltages in 90-nm CMOS , 2004, IEEE Journal of Solid-State Circuits.

[20]  M. Hosomi,et al.  A novel nonvolatile memory with spin torque transfer magnetization switching: spin-ram , 2005, IEEE InternationalElectron Devices Meeting, 2005. IEDM Technical Digest..

[21]  Xiaoxia Wu,et al.  Hybrid cache architecture with disparate memory technologies , 2009, ISCA '09.

[22]  S. Ikeda,et al.  2 Mb SPRAM (SPin-Transfer Torque RAM) With Bit-by-Bit Bi-Directional Current Write and Parallelizing-Direction Current Read , 2008, IEEE Journal of Solid-State Circuits.

[23]  Vikas Agarwal,et al.  Clock rate versus IPC: the end of the road for conventional microarchitectures , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[24]  Michael D. Ciletti,et al.  Advanced Digital Design with the Verilog HDL , 2010 .

[25]  Improving STT MRAM storage density through smaller-than-worst-case transistor sizing , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[26]  Eric Belhaire,et al.  Spin transfer torque (STT)-MRAM--based runtime reconfiguration FPGA circuit , 2009, TECS.

[27]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[28]  Yiran Chen,et al.  Circuit and microarchitecture evaluation of 3D stacking magnetic RAM (MRAM) as a universal memory replacement , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[29]  Jun Yang,et al.  Energy reduction for STT-RAM using early write termination , 2009, 2009 IEEE/ACM International Conference on Computer-Aided Design - Digest of Technical Papers.

[30]  Jonathan Chang,et al.  A 45 nm 8-Core Enterprise Xeon¯ Processor , 2010, IEEE J. Solid State Circuits.

[31]  Paul D. Franzon,et al.  FreePDK: An Open-Source Variation-Aware Design Kit , 2007, 2007 IEEE International Conference on Microelectronic Systems Education (MSE'07).

[32]  Yoshihiro Ueda,et al.  A 64Mb MRAM with clamped-reference and adequate-reference schemes , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).

[33]  A. Kumar,et al.  Implementation of an 8-Core, 64-Thread, Power-Efficient SPARC Server on a Chip , 2008, IEEE Journal of Solid-State Circuits.

[34]  Yiran Chen,et al.  A novel architecture of the 3D stacked MRAM L2 cache for CMPs , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[35]  Norman P. Jouppi,et al.  Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0 , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[36]  A. Fert,et al.  The emergence of spin electronics in data storage. , 2007, Nature materials.

[37]  Yiming Huai,et al.  Spin-Transfer Torque MRAM (STT-MRAM): Challenges and Prospects , 2008 .