Procrustes1: Power Constrained Performance Improvement Using Extended Maximize-Then-Swap Algorithm

This paper proposes an efficient algorithm that maximizes performance under power constraints and is applicable in the general context of traditional dynamic voltage/frequency (V/P) scaling, or core heterogeneity and emerging dynamic micro-architectural adaptation. Performance maximization in these scenarios can be essentially viewed as mapping application threads to appropriate core states that have various power/performance characteristics. Such problems are formulated as a generic 0-1 integer linear program (ILP). The proposed algorithm is an iterative heuristic-based solution. Compared with an optimal solution generated by commercial ILP solver, the proposed algorithm produces results less than 1% away from optimum on average, with more than two orders of magnitude improvement in runtime. The algorithm can be brought online for hundred-core heterogeneous systems as it scales to systems comprised of 256 cores with less than 1 ms in overhead in worst cases. The intrinsic history awareness also provides flexibility to control cost induced by switching V/F pairs, migrating threads across cores, or tuning on/off micro-architectural resources.

[1]  Stacey Jeffery,et al.  HASS: a scheduler for heterogeneous multicore systems , 2009, OPSR.

[2]  Josep Torrellas,et al.  Variation-Aware Application Scheduling and Power Management for Chip Multiprocessors , 2008, 2008 International Symposium on Computer Architecture.

[3]  Patrick Crowley,et al.  Dynamic thread assignment on heterogeneous multiprocessor architectures , 2006, CF '06.

[4]  Norman P. Jouppi,et al.  Single-ISA heterogeneous multi-core architectures: the potential for processor power reduction , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[5]  Norman P. Jouppi,et al.  Single-ISA heterogeneous multi-core architectures for multithreaded workload performance , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[6]  Lizy Kurian John,et al.  Efficient program scheduling for heterogeneous multi-core processors , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[7]  Lieven Eeckhout,et al.  Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[8]  Meeta Sharma Gupta,et al.  System level analysis of fast, per-core DVFS using on-chip switching regulators , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[9]  Engin Ipek,et al.  Core fusion: accommodating software diversity in chip multiprocessors , 2007, ISCA '07.

[10]  Anuj Pathania,et al.  Price theory based power management for heterogeneous multi-cores , 2014, ASPLOS.

[11]  Yale N. Patt,et al.  MorphCore: An Energy-Efficient Microarchitecture for High Performance ILP and High Throughput TLP , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[12]  Diana Marculescu,et al.  Variation-aware dynamic voltage/frequency scaling , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[13]  Christine A. Shoemaker,et al.  Scalable thread scheduling and global power management for heterogeneous many-core architectures , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[14]  Christine A. Shoemaker,et al.  Flicker: a dynamically adaptive architecture for power limited multicore systems , 2013, ISCA.

[15]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[16]  Lieven Eeckhout,et al.  Scheduling heterogeneous multi-cores through performance impact estimation (PIE) , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[17]  Siddharth Garg,et al.  Learning the optimal operating point for many-core systems with extended range voltage/frequency scaling , 2013, 2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[18]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[19]  Richard M. Karp,et al.  Reducibility among combinatorial problems" in complexity of computer computations , 1972 .

[20]  J. Mitchell Branch-and-Cut Algorithms for Combinatorial Optimization Problems , 1988 .

[21]  Jian Li,et al.  Dynamic power-performance adaptation of parallel computation on chip multiprocessors , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[22]  Dheeraj Reddy,et al.  Bias scheduling in heterogeneous multi-core architectures , 2010, EuroSys '10.