Performance enhancement under power constraints using heterogeneous CMOS-TFET multicores

Device level heterogeneity promises high energy efficiency over a larger range of voltages than a single device technology alone can provide. In this paper, starting from device models, we first present ground-up modeling of CMOS and TFET cores, and verify this model against existing processors. Using our core models, we construct a 32-core TFET-CMOS heterogeneous multicore. We then show that it is a very challenging task to identify the ideal runtime configuration to use in such a heterogeneous multicore, which includes finding the best number/type of cores to activate and the corresponding voltages/frequencies to select for these cores. In order to effectively utilize this heterogeneous processor, we propose a novel automated runtime scheme. Our scheme is designed to automatically improve the performance of applications running on heterogeneous CMOS-TFET multicores operating under a fixed power budget, without requiring any effort from the application programmer or the user. Our scheme combines heterogeneous thread-to-core mapping, dynamic work partitioning, and dynamic power partitioning to identify energy efficient operating points. With simulations we show that our runtime scheme can enable a CMOS-TFET multicore to serve a diversity of workloads with high energy efficiency and achieve 21% average speedup over the best performing equivalent homogeneous multicore.

[1]  Steven Swanson,et al.  Conservation cores: reducing the energy of mature computations , 2010, ASPLOS XV.

[2]  Sally A. McKee,et al.  Accomodating Diversity in CMPs with Heterogeneous Frequencies , 2009, HiPEAC.

[3]  Meeta Sharma Gupta,et al.  System level analysis of fast, per-core DVFS using on-chip switching regulators , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[4]  CONSTANTINE D. POLYCHRONOPOULOS,et al.  Guided Self-Scheduling: A Practical Scheduling Scheme for Parallel Supercomputers , 1987, IEEE Transactions on Computers.

[5]  James C. Hoe,et al.  Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs? , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[6]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[7]  Suman Datta,et al.  Ultrahigh-Speed 0 . 5 V Supply Voltage In 0 . 7 Ga 0 . 3 As Quantum-Well Transistors on Silicon Substrate , 2009 .

[8]  Ian A. Young,et al.  Comparison of performance, switching energy and process variations for the TFET and MOSFET in logic , 2011, 2011 Symposium on VLSI Technology - Digest of Technical Papers.

[9]  Narayanan Vijaykrishnan,et al.  An energy-efficient heterogeneous CMP based on hybrid TFET-CMOS cores , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).

[10]  Coniferous softwood GENERAL TERMS , 2003 .

[11]  Rudolf Eigenmann,et al.  SPEComp: A New Benchmark Suite for Measuring Parallel Computer Performance , 2001, WOMPAT.

[12]  Ravi Rajwar,et al.  The impact of performance asymmetry in emerging multicore architectures , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[13]  Kevin Skadron,et al.  Scaling with Design Constraints: Predicting the Future of Big Chips , 2011, IEEE Micro.

[14]  Onur Mutlu,et al.  Accelerating critical section execution with asymmetric multi-core architectures , 2009, ASPLOS.

[15]  Josep Torrellas,et al.  Variation-Aware Application Scheduling and Power Management for Chip Multiprocessors , 2008, 2008 International Symposium on Computer Architecture.

[16]  Josep Torrellas,et al.  The BubbleWrap many-core: Popping cores for sequential acceleration , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[17]  Norman P. Jouppi,et al.  Single-ISA heterogeneous multi-core architectures for multithreaded workload performance , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[18]  Kaustav Banerjee,et al.  Vertical Si-Nanowire n-Type Tunneling FETs With Low Subthreshold Swing (≤ 50 mV/decade) at Room Temperature , 2011 .

[19]  Yu Cao,et al.  New generation of predictive technology model for sub-45nm design exploration , 2006, 7th International Symposium on Quality Electronic Design (ISQED'06).

[20]  Engin Ipek,et al.  Core fusion: accommodating software diversity in chip multiprocessors , 2007, ISCA '07.

[21]  Gerhard Klimeck,et al.  Performance comparisons of tunneling field-effect transistors made of InSb, Carbon, and GaSb-InAs broken gap heterostructures , 2009, 2009 IEEE International Electron Devices Meeting (IEDM).

[22]  J. Fastenau,et al.  Demonstration of MOSFET-like on-current performance in arsenide/antimonide tunnel FETs with staggered hetero-junctions for 300mV logic applications , 2011, 2011 International Electron Devices Meeting.

[23]  Anantha Chandrakasan,et al.  JouleTrack: a web based tool for software energy profiling , 2001, DAC '01.

[24]  Norman P. Jouppi,et al.  Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction , 2003, MICRO.

[25]  Mark D. Hill,et al.  Amdahl's Law in the Multicore Era , 2008, Computer.

[26]  Hyesoon Kim,et al.  Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[27]  Mahmut T. Kandemir,et al.  Improving energy efficiency of multi-threaded applications using heterogeneous CMOS-TFET multicores , 2011, IEEE/ACM International Symposium on Low Power Electronics and Design.

[28]  J. H. Chen,et al.  High performance 22/20nm FinFET CMOS devices with advanced high-K/metal gate scheme , 2010, 2010 International Electron Devices Meeting.

[29]  Narayanan Vijaykrishnan,et al.  Variation-tolerant ultra low-power heterojunction tunnel FET SRAM design , 2011, 2011 IEEE/ACM International Symposium on Nanoscale Architectures.

[30]  Kevin Skadron,et al.  Impact of Process Variations on Multicore Performance Symmetry , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.

[31]  S. Datta,et al.  Ultrahigh-Speed 0.5 V Supply Voltage $\hbox{In}_{0.7} \hbox{Ga}_{0.3}\hbox{As}$ Quantum-Well Transistors on Silicon Substrate , 2007, IEEE Electron Device Letters.

[32]  S. Datta,et al.  Self-aligned gate nanopillar In0.53Ga0.47As vertical tunnel transistor , 2011, 69th Device Research Conference.

[33]  Uri C. Weiser,et al.  Performance, power efficiency and scalability of asymmetric cluster chip multiprocessors , 2006, IEEE Computer Architecture Letters.

[34]  T. Mayer,et al.  Experimental demonstration of 100nm channel length In0.53Ga0.47As-based vertical inter-band tunnel field effect transistors (TFETs) for ultra low-power logic and SRAM applications , 2009, 2009 IEEE International Electron Devices Meeting (IEDM).