论文信息 - A unified view of non-monotonic core selection and application steering in heterogeneous chip multiprocessors

A unified view of non-monotonic core selection and application steering in heterogeneous chip multiprocessors

A single-ISA heterogeneous chip multiprocessor (HCMP) is an attractive substrate to improve single-thread performance and energy efficiency in the dark silicon era. We consider HCMPs comprised of non-monotonic core types where each core type is performance-optimized to different instruction-level behavior and hence cannot be ranked - different program phases achieve their highest performance on different cores. Although non-monotonic heterogeneous designs offer higher performance potential than either monotonic heterogeneous designs or homogeneous designs, steering applications to the best-performing core is challenging due to performance ambiguity of core types.

[1] John Paul Shen,et al. Best of both latency and throughput , 2004, IEEE International Conference on Computer Design: VLSI in Computers and Processors, 2004. ICCD 2004. Proceedings..

[2] P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[3] David Kaeli,et al. Speculative execution in high performance computer architectures , 2005 .

[4] Norman P. Jouppi,et al. Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction , 2003, MICRO.

[5] Mark D. Hill,et al. Amdahl's Law in the Multicore Era , 2008, Computer.

[6] Gernot Heiser,et al. Dynamic voltage and frequency scaling: the laws of diminishing returns , 2010 .

[7] Eric Rotenberg,et al. Configurational Workload Characterization , 2008, ISPASS 2008 - IEEE International Symposium on Performance Analysis of Systems and software.

[8] Karthikeyan Sankaralingam,et al. Dark Silicon and the End of Multicore Scaling , 2012, IEEE Micro.

[9] Michael F. P. O'Boyle,et al. A Predictive Model for Dynamic Microarchitectural Adaptivity Control , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[10] David M. Brooks,et al. Illustrative Design Space Studies with Microarchitectural Regression Models , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[11] Eric Rotenberg,et al. Core-Selectability in Chip Multiprocessors , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.

[12] Niket K. Choudhary,et al. FabScalar: Automating the Design of Superscalar Processors. , 2012 .

[13] Eric Rotenberg,et al. Criticality-driven superscalar design space exploration , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[14] Lina Sawalha,et al. Phase-Guided Scheduling on Single-ISA Heterogeneous Multicore Processors , 2011, 2011 14th Euromicro Conference on Digital System Design.

[15] Karthikeyan Sankaralingam,et al. Dark silicon and the end of multicore scaling , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[16] Eric Rotenberg,et al. Architectural Contesting , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[17] Patrick Crowley,et al. Dynamic thread assignment on heterogeneous multiprocessor architectures , 2006, CF '06.

[18] Manuel Prieto,et al. A comprehensive scheduler for asymmetric multicore systems , 2010, EuroSys '10.

[19] Steven Swanson,et al. GreenDroid: A mobile application processor for a future of dark silicon , 2010, 2010 IEEE Hot Chips 22 Symposium (HCS).

[20] Brad Calder,et al. Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[21] Mark Horowitz,et al. Energy-performance tradeoffs in processor architecture and circuit design: a marginal cost analysis , 2010, ISCA.

[22] David Kaeli,et al. Speculative Execution In High Performance Computer Architectures (Chapman & Hall/Crc Computer & Information Science Series) , 2005 .

[23] Norman P. Jouppi,et al. CACTI: an enhanced cache access and cycle time model , 1996, IEEE J. Solid State Circuits.

[24] Lieven Eeckhout,et al. Measuring Program Similarity: Experiments with SPEC CPU Benchmark Suites , 2005, IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005..

[25] Christine A. Shoemaker,et al. Scalable thread scheduling and global power management for heterogeneous many-core architectures , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[26] Todd M. Austin,et al. SimpleScalar: An Infrastructure for Computer System Modeling , 2002, Computer.

[27] Norman P. Jouppi,et al. Core architecture optimization for heterogeneous chip multiprocessors , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[28] Haitham Akkary,et al. Continual flow pipelines , 2004, ASPLOS XI.

[29] Norman P. Jouppi,et al. Single-ISA heterogeneous multi-core architectures for multithreaded workload performance , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[30] Lieven Eeckhout,et al. Scheduling heterogeneous multi-cores through performance impact estimation (PIE) , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[31] Alain J. Martin,et al. ET 2 : a metric for time and energy efficiency of computation , 2002 .

[32] Margaret Martonosi,et al. Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[33] Dheeraj Reddy,et al. Bias scheduling in heterogeneous multi-core architectures , 2010, EuroSys '10.

[34] Israel Koren,et al. Performance Per Watt Benefits of Dynamic Core Morphing in Asymmetric Multicores , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[35] Onur Mutlu,et al. Accelerating critical section execution with asymmetric multi-core architectures , 2009, ASPLOS.

[36] Lizy Kurian John,et al. Efficient program scheduling for heterogeneous multi-core processors , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[37] Stacey Jeffery,et al. HASS: a scheduler for heterogeneous multicore systems , 2009, OPSR.

[38] Niket Kumar Choudhary,et al. Efficiently exploiting memory level parallelism on asymmetric coupled cores in the dark silicon era , 2012, TACO.

[39] Scott A. Mahlke,et al. Composite Cores: Pushing Heterogeneity Into a Core , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[40] David E. Goldberg,et al. Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[41] Eric Rotenberg,et al. FabScalar: Composing synthesizable RTL designs of arbitrary cores within a canonical superscalar template , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[42] Onur Mutlu,et al. Runahead execution: an alternative to very large instruction windows for out-of-order processors , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[43] Hridesh Rajan,et al. Phase-based tuning for better utilization of performance-asymmetric multicore processors , 2011, International Symposium on Code Generation and Optimization (CGO 2011).