Exploring Fine-Grained Heterogeneity with Composite Cores

Heterogeneous multicore systems- comprising multiple cores with varying performance and energy characteristics-have emerged as a promising approach to increasing energy efficiency. Such systems reduce energy consumption by identifying application phases and migrating execution to the most efficient core that meets performance requirements. However, the overheads of migrating between cores limit opportunities to coarse-grained phases (hundreds of millions of instructions), reducing the potential to exploit energy efficient cores. We propose Composite Cores, an architecture that reduces migration overheads by bringing heterogeneity into a core. Composite Cores pairs a big and little compute μEngine that together achieve high performance and energy efficiency. By sharing architectural state between the μEngines, the migration overhead is reduced, enabling fine-grained migration and increasing the opportunities to utilize the little μEngine without sacrificing performance. An intelligent controller migrates the application between μEngines to maximize energy efficiency while constraining performance loss to a configurable bound. We evaluate Composite Cores using cycle accurate microarchitectural simulations and a detailed power model. Results show that, on average, Composite Cores are able to map 30 percent of the execution time to the little μEngine, achieving a 21 percent energy savings while maintaining 95 percent performance.

[1]  Srilatha Manne,et al.  Power and energy reduction via pipeline balancing , 2001, ISCA 2001.

[2]  Michael C. Huang,et al.  Dynamically Tuning Processor Resources with Adaptive Processing , 2003, Computer.

[3]  John Paul Shen,et al.  Mitigating Amdahl's law through EPI throttling , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[4]  Onur Mutlu,et al.  Bottleneck identification and scheduling in multithreaded applications , 2012, ASPLOS XVII.

[5]  Seung Ryoul Maeng,et al.  Virtualizing performance asymmetric multi-core systems , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[6]  Chris Fallin,et al.  The heterogeneous block architecture , 2014, 2014 IEEE 32nd International Conference on Computer Design (ICCD).

[7]  Lieven Eeckhout,et al.  Scheduling heterogeneous multi-cores through performance impact estimation (PIE) , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[8]  Michel Dubois,et al.  Dynamic MIPS rate stabilization in out-of-order processors , 2009, ISCA '09.

[9]  R. Balasubramonian,et al.  Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures , 2000, Proceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-33 2000.

[10]  Roland E. Wunderlich,et al.  SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[11]  Herman Schmit,et al.  A low-power 16-bit multiplier-accumulator using series-regulated mixed swing techniques , 1998, Proceedings of the IEEE 1998 Custom Integrated Circuits Conference (Cat. No.98CH36143).

[12]  Mark D. Hill,et al.  Amdahl's Law in the Multicore Era , 2008 .

[13]  Norman P. Jouppi,et al.  Single-ISA heterogeneous multi-core architectures: the potential for processor power reduction , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[14]  Lei Chen,et al.  Dynamic data dependence tracking and its application to branch prediction , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[15]  Engin Ipek,et al.  Core fusion: accommodating software diversity in chip multiprocessors , 2007, ISCA '07.

[16]  Scott A. Mahlke,et al.  Trace based phase prediction for tightly-coupled heterogeneous cores , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[17]  Houman Homayoun,et al.  Dynamically heterogeneous cores through 3D resource pooling , 2012, IEEE International Symposium on High-Performance Comp Architecture.

[18]  Dheeraj Reddy,et al.  Bias scheduling in heterogeneous multi-core architectures , 2010, EuroSys '10.

[19]  Ravi Rajwar,et al.  The impact of performance asymmetry in emerging multicore architectures , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[20]  David A. Wood,et al.  A Primer on Memory Consistency and Cache Coherence , 2012, Synthesis Lectures on Computer Architecture.

[21]  Onur Mutlu,et al.  Accelerating critical section execution with asymmetric multi-core architectures , 2009, ASPLOS.

[22]  Norman P. Jouppi,et al.  Conjoined-Core Chip Multiprocessing , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[23]  William H. Mangione-Smith,et al.  The filter cache: an energy efficient memory structure , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[24]  S. Winkel Optimal versus Heuristic Global Code Scheduling , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[25]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[26]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[27]  Gürhan Küçük,et al.  Reducing power requirements of instruction scheduling through dynamic allocation of multiple datapath resources , 2001, MICRO.

[28]  Niket Kumar Choudhary,et al.  Efficiently exploiting memory level parallelism on asymmetric coupled cores in the dark silicon era , 2012, TACO.

[29]  Gu-Yeon Wei,et al.  Thread motion: fine-grained power management for multi-core systems , 2009, ISCA '09.

[30]  Milos D. Ercegovac,et al.  The Art of Deception: Adaptive Precision Reduction for Area Efficient Physics Acceleration , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[31]  Norman P. Jouppi,et al.  Core architecture optimization for heterogeneous chip multiprocessors , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[32]  Norman P. Jouppi,et al.  Single-ISA heterogeneous multi-core architectures for multithreaded workload performance , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..