Dynamic thread mapping for high-performance, power-efficient heterogeneous many-core systems

This paper addresses the problem of dynamic thread mapping in heterogeneous many-core systems via an efficient algorithm that maximizes performance under power constraints. Heterogeneous many-core systems are composed of multiple core types with different power-performance characteristics. As well documented in the literature, the generic mapping problem is an NP-complete problem which can be formulated as a 0-1 integer linear program, therefore, prohibitively expensive to solve optimally in an online scenario. However, in real applications, thread mapping decisions need to be responsive to workload phase changes. This paper proposes an iterative approach bounding the runtime as O(n2/m), for mapping multi-threaded applications on n cores comprising of m core types. Compared with an optimal solution, the proposed algorithm produces results less than 0.6% away from optimum on average, with two orders of magnitude improvement in runtime. Results show that performance improvement can reach 16% under iso-power constraints compared to a random mapping. The algorithm can be brought online for hundred-core heterogeneous systems as it scales to systems comprised of 256 cores with less than one millisecond in overhead.

[1]  Lieven Eeckhout,et al.  Scheduling heterogeneous multi-cores through performance impact estimation (PIE) , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[2]  Margaret Martonosi,et al.  An Analysis of Efficient Multi-Core Global Power Management Policies: Maximizing Performance for a Given Power Budget , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[3]  Lizy Kurian John,et al.  Efficient program scheduling for heterogeneous multi-core processors , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[4]  Lieven Eeckhout,et al.  Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[5]  David M. Brooks,et al.  Illustrative Design Space Studies with Microarchitectural Regression Models , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[6]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[7]  Norman P. Jouppi,et al.  Single-ISA heterogeneous multi-core architectures for multithreaded workload performance , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[8]  Christine A. Shoemaker,et al.  Scalable thread scheduling and global power management for heterogeneous many-core architectures , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[9]  Karsten Schwan,et al.  The Forgotten 'Uncore': On the Energy-Efficiency of Heterogeneous Cores , 2012, USENIX Annual Technical Conference.

[10]  Norman P. Jouppi,et al.  Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction , 2003, MICRO.

[11]  Josep Torrellas,et al.  Variation-Aware Application Scheduling and Power Management for Chip Multiprocessors , 2008, 2008 International Symposium on Computer Architecture.

[12]  Siddharth Garg,et al.  HaDeS: Architectural synthesis for heterogeneous dark silicon chip multi-processors , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[13]  Patrick Crowley,et al.  Dynamic thread assignment on heterogeneous multiprocessor architectures , 2006, CF '06.

[14]  Richard M. Karp,et al.  Reducibility Among Combinatorial Problems , 1972, 50 Years of Integer Programming.

[15]  Dheeraj Reddy,et al.  Bias scheduling in heterogeneous multi-core architectures , 2010, EuroSys '10.

[16]  Stacey Jeffery,et al.  HASS: a scheduler for heterogeneous multicore systems , 2009, OPSR.

[17]  Yiannakis Sazeides,et al.  Performance implications of single thread migration on a chip multi-core , 2005, CARN.

[18]  J. Mitchell Branch-and-Cut Algorithms for Combinatorial Optimization Problems , 1988 .

[19]  Christine A. Shoemaker,et al.  Flicker: a dynamically adaptive architecture for power limited multicore systems , 2013, ISCA.

[20]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).