Exploiting Heterogeneity for Energy Efficiency in

Heterogeneous multicores are envisioned to be a promising design paradigm to combat today's challenges of power, memory, and reliability walls that are impeding chip design using deep submicron technology. Future multicores are expected to integrate multiple different cores, including GPGPUs, custom accelerators and configurable cores. In this paper, we introduce an important dimension—technology—using which heterogeneity can be introduced in multicores to improve their energy-performance envelope. Specifically, we analyze the benefits of heterogenous technologies for processor cores and cache sub- systems. We discuss two promising device candidates (Tunnel-FET and Magnetic-RAM) for introducing technological diversity in the multicores and analyze their integration in the processor and cache hierarchy in detail. Our analysis shows that introducing such a kind of heterogeneity can significantly enhance the perfor- mance and energy behavior of future multicore systems. IndexTerms—Bistable circuits, circuit simulation, finite-element transistors (FETs), heterojunctions, hybrid integrated circuits, logic circuits, magnetic circuits, magnetic tunneling, MOSFET circuits, microprocessors, tunneling.

[1]  Z. Diao,et al.  Spin-transfer torque switching in magnetic tunnel junctions and spin-transfer torque random access memory , 2007 .

[2]  Byung-Gook Park,et al.  Tunneling Field-Effect Transistors (TFETs) With Subthreshold Swing (SS) Less Than 60 mV/dec , 2007, IEEE Electron Device Letters.

[3]  Dirk Grunwald,et al.  Aide de camp: asymmetric multi-core design for dynamic thermal management , 2004 .

[4]  Seung-Ho Lim,et al.  PFFS: a scalable flash memory file system for the hybrid architecture of phase-change RAM and NAND flash , 2008, SAC '08.

[5]  Yiran Chen,et al.  A novel architecture of the 3D stacked MRAM L2 cache for CMPs , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[6]  M. Hosomi,et al.  A novel nonvolatile memory with spin torque transfer magnetization switching: spin-ram , 2005, IEEE InternationalElectron Devices Meeting, 2005. IEDM Technical Digest..

[7]  E. Belhaire,et al.  Macro-model of Spin-Transfer Torque based Magnetic Tunnel Junction device for hybrid Magnetic-CMOS design , 2006, 2006 IEEE International Behavioral Modeling and Simulation Workshop.

[8]  FoleyDenis,et al.  AMD Fusion APU , 2012 .

[9]  Per Stenström,et al.  Microprocessors in the Era of Terascale Integration , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.

[10]  Onur Mutlu,et al.  Accelerating critical section execution with asymmetric multi-core architectures , 2009, ASPLOS.

[11]  Tong Li,et al.  Operating system support for overlapping-ISA heterogeneous multi-core architectures , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[12]  Ran Ginosar,et al.  Network Delays and Link Capacities in Application-Specific Wormhole NoCs , 2007, VLSI Design.

[13]  Onur Mutlu,et al.  Data marshaling for multi-core architectures , 2010, ISCA.

[14]  Xiaoxia Wu,et al.  Hybrid cache architecture with disparate memory technologies , 2009, ISCA '09.

[15]  David H. Albonesi,et al.  ReMAP: A Reconfigurable Heterogeneous Multicore Architecture , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[16]  K. Roy,et al.  Band-to-Band Tunneling Ballistic Nanowire FET: Circuit-Compatible Device Modeling and Design of Ultra-Low-Power Digital Circuits and Memories , 2009, IEEE Transactions on Electron Devices.

[17]  Shih-Hung Chen,et al.  Phase-change random access memory: A scalable technology , 2008, IBM J. Res. Dev..

[18]  Lei Jiang,et al.  Die Stacking (3D) Microarchitecture , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[19]  Mircea R. Stan,et al.  The Promise of Nanomagnetics and Spintronics for Future Logic and Universal Memory , 2010, Proceedings of the IEEE.

[20]  Narayanan Vijaykrishnan,et al.  A low-power phase change memory based hybrid cache architecture , 2008, GLSVLSI '08.

[21]  Yuan Xie,et al.  Design space exploration for 3D architectures , 2006, JETC.

[22]  Peter J. Denning,et al.  Working Sets Past and Present , 1980, IEEE Transactions on Software Engineering.

[23]  Jason Cong,et al.  CMP network-on-chip overlaid with multi-band RF-interconnect , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[24]  H. Peter Hofstee,et al.  Introduction to the Cell multiprocessor , 2005, IBM J. Res. Dev..

[25]  Engin Ipek,et al.  Resistive computation: avoiding the power wall with low-leakage, STT-MRAM based computing , 2010, ISCA.

[26]  Srinivasan Murali,et al.  Mapping and configuration methods for multi-use-case networks on chips , 2006, Asia and South Pacific Conference on Design Automation, 2006..

[27]  Anantha Chandrakasan,et al.  JouleTrack: a web based tool for software energy profiling , 2001, DAC '01.

[28]  Chita R. Das,et al.  A case for dynamic frequency tuning in on-chip networks , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[29]  Norman P. Jouppi,et al.  Single-ISA heterogeneous multi-core architectures for multithreaded workload performance , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[30]  Konrad K. Lai,et al.  The Impact of Performance Asymmetry in Emerging Multicore Architectures , 2005, ISCA 2005.

[31]  Onur Mutlu,et al.  Accelerating Critical Section Execution with Asymmetric Multicore Architectures , 2010, IEEE Micro.

[32]  Radu Marculescu,et al.  Application-specific buffer space allocation for networks-on-chip router design , 2004, ICCAD 2004.

[33]  William J. Dally,et al.  Design tradeoffs for tiled CMP on-chip networks , 2006, ICS '06.

[34]  Kunle Olukotun,et al.  Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.

[35]  James C. Hoe,et al.  Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs? , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[36]  Uri C. Weiser,et al.  Performance, power efficiency and scalability of asymmetric cluster chip multiprocessors , 2006, IEEE Computer Architecture Letters.

[37]  T. Mayer,et al.  Experimental demonstration of 100nm channel length In0.53Ga0.47As-based vertical inter-band tunnel field effect transistors (TFETs) for ultra low-power logic and SRAM applications , 2009, 2009 IEEE International Electron Devices Meeting (IEDM).

[38]  Jason Cong,et al.  Power reduction of CMP communication networks via RF-interconnects , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[39]  Norman P. Jouppi,et al.  Heterogeneous chip multiprocessors , 2005, Computer.

[40]  Sanu Mathew,et al.  Energy-delay estimation technique for high-performance microprocessor VLSI adders , 2003, Proceedings 2003 16th IEEE Symposium on Computer Arithmetic.

[41]  Mahmut T. Kandemir,et al.  Exploiting processor workload heterogeneity for reducing energy consumption in chip multiprocessors , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[42]  Steven Swanson,et al.  Conservation cores: reducing the energy of mature computations , 2010, ASPLOS 2010.