A unified view of non-monotonic core selection and application steering in heterogeneous chip multiprocessors

A single-ISA heterogeneous chip multiprocessor (HCMP) is an attractive substrate to improve single-thread performance and energy efficiency in the dark silicon era. We consider HCMPs comprised of non-monotonic core types where each core type is performance-optimized to different instruction-level behavior and hence cannot be ranked - different program phases achieve their highest performance on different cores. Although non-monotonic heterogeneous designs offer higher performance potential than either monotonic heterogeneous designs or homogeneous designs, steering applications to the best-performing core is challenging due to performance ambiguity of core types.

[1]  John Paul Shen,et al.  Best of both latency and throughput , 2004, IEEE International Conference on Computer Design: VLSI in Computers and Processors, 2004. ICCD 2004. Proceedings..

[2]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[3]  David Kaeli,et al.  Speculative execution in high performance computer architectures , 2005 .

[4]  Norman P. Jouppi,et al.  Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction , 2003, MICRO.

[5]  Mark D. Hill,et al.  Amdahl's Law in the Multicore Era , 2008, Computer.

[6]  Gernot Heiser,et al.  Dynamic voltage and frequency scaling: the laws of diminishing returns , 2010 .

[7]  Eric Rotenberg,et al.  Configurational Workload Characterization , 2008, ISPASS 2008 - IEEE International Symposium on Performance Analysis of Systems and software.

[8]  Karthikeyan Sankaralingam,et al.  Dark Silicon and the End of Multicore Scaling , 2012, IEEE Micro.

[9]  Michael F. P. O'Boyle,et al.  A Predictive Model for Dynamic Microarchitectural Adaptivity Control , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[10]  David M. Brooks,et al.  Illustrative Design Space Studies with Microarchitectural Regression Models , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[11]  Eric Rotenberg,et al.  Core-Selectability in Chip Multiprocessors , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.

[12]  Niket K. Choudhary,et al.  FabScalar: Automating the Design of Superscalar Processors. , 2012 .

[13]  Eric Rotenberg,et al.  Criticality-driven superscalar design space exploration , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[14]  Lina Sawalha,et al.  Phase-Guided Scheduling on Single-ISA Heterogeneous Multicore Processors , 2011, 2011 14th Euromicro Conference on Digital System Design.

[15]  Karthikeyan Sankaralingam,et al.  Dark silicon and the end of multicore scaling , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[16]  Eric Rotenberg,et al.  Architectural Contesting , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[17]  Patrick Crowley,et al.  Dynamic thread assignment on heterogeneous multiprocessor architectures , 2006, CF '06.

[18]  Manuel Prieto,et al.  A comprehensive scheduler for asymmetric multicore systems , 2010, EuroSys '10.

[19]  Steven Swanson,et al.  GreenDroid: A mobile application processor for a future of dark silicon , 2010, 2010 IEEE Hot Chips 22 Symposium (HCS).

[20]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[21]  Mark Horowitz,et al.  Energy-performance tradeoffs in processor architecture and circuit design: a marginal cost analysis , 2010, ISCA.

[22]  David Kaeli,et al.  Speculative Execution In High Performance Computer Architectures (Chapman & Hall/Crc Computer & Information Science Series) , 2005 .

[23]  Norman P. Jouppi,et al.  CACTI: an enhanced cache access and cycle time model , 1996, IEEE J. Solid State Circuits.

[24]  Lieven Eeckhout,et al.  Measuring Program Similarity: Experiments with SPEC CPU Benchmark Suites , 2005, IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005..

[25]  Christine A. Shoemaker,et al.  Scalable thread scheduling and global power management for heterogeneous many-core architectures , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[26]  Todd M. Austin,et al.  SimpleScalar: An Infrastructure for Computer System Modeling , 2002, Computer.

[27]  Norman P. Jouppi,et al.  Core architecture optimization for heterogeneous chip multiprocessors , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[28]  Haitham Akkary,et al.  Continual flow pipelines , 2004, ASPLOS XI.

[29]  Norman P. Jouppi,et al.  Single-ISA heterogeneous multi-core architectures for multithreaded workload performance , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[30]  Lieven Eeckhout,et al.  Scheduling heterogeneous multi-cores through performance impact estimation (PIE) , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[31]  Alain J. Martin,et al.  ET 2 : a metric for time and energy efficiency of computation , 2002 .

[32]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[33]  Dheeraj Reddy,et al.  Bias scheduling in heterogeneous multi-core architectures , 2010, EuroSys '10.

[34]  Israel Koren,et al.  Performance Per Watt Benefits of Dynamic Core Morphing in Asymmetric Multicores , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[35]  Onur Mutlu,et al.  Accelerating critical section execution with asymmetric multi-core architectures , 2009, ASPLOS.

[36]  Lizy Kurian John,et al.  Efficient program scheduling for heterogeneous multi-core processors , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[37]  Stacey Jeffery,et al.  HASS: a scheduler for heterogeneous multicore systems , 2009, OPSR.

[38]  Niket Kumar Choudhary,et al.  Efficiently exploiting memory level parallelism on asymmetric coupled cores in the dark silicon era , 2012, TACO.

[39]  Scott A. Mahlke,et al.  Composite Cores: Pushing Heterogeneity Into a Core , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[40]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[41]  Eric Rotenberg,et al.  FabScalar: Composing synthesizable RTL designs of arbitrary cores within a canonical superscalar template , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[42]  Onur Mutlu,et al.  Runahead execution: an alternative to very large instruction windows for out-of-order processors , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[43]  Hridesh Rajan,et al.  Phase-based tuning for better utilization of performance-asymmetric multicore processors , 2011, International Symposium on Code Generation and Optimization (CGO 2011).