Scheduling for HPC Systems with Process Variation Heterogeneity

Variation in the CMOS manufacturing processes cause the transistors on each chip to differ, which results in many-core chips being inherently heterogeneous. For example, frequency and power consumption profiles of cores can span a wide range. This makes optimal scheduling of applications under a power budget computationally difficult, because of the combinatorially large number of choices available. To facilitate this, we model the performance and power consumption of HPC applications on such heterogeneous chips. Based on our models, we propose a scheduling framework using integer linear programming (ILP), which enables efficient scheduling with various power consumption and performance constraints. Using this framework, an HPC runtime system can decide how many and which cores of a chip to use depending on the application, the properties of the chip, and the imposed constraints. Our results show that our framework finds configurations that are up to 2.5 times faster than the ones obtained from simple heuristics. We also propose various research directions for this problem based on our framework.

[1]  Saurabh Dighe,et al.  Within-Die Variation-Aware Dynamic-Voltage-Frequency-Scaling With Optimal Core Allocation and Thread Hopping for the 80-Core TeraFLOPS Processor , 2011, IEEE Journal of Solid-State Circuits.

[2]  Shekhar Y. Borkar,et al.  Designing reliable systems from unreliable components: the challenges of transistor variability and degradation , 2005, IEEE Micro.

[3]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[4]  James Tschanz,et al.  Parameter variations and impact on circuits and microarchitecture , 2003, Proceedings 2003. Design Automation Conference (IEEE Cat. No.03CH37451).

[5]  Alexey L. Lastovetsky,et al.  On performance analysis of heterogeneous parallel algorithms , 2004, Parallel Comput..

[6]  Sungsoo Park,et al.  Algorithms for the variable sized bin packing problem , 2003, Eur. J. Oper. Res..

[7]  Laxmikant V. Kalé,et al.  Using an Adaptive HPC Runtime System to Reconfigure the Cache Hierarchy , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[8]  Mark Anders,et al.  Near-threshold voltage (NTV) design — Opportunities and challenges , 2012, DAC Design Automation Conference 2012.

[9]  Chao Mei,et al.  Optimizing fine-grained communication in a biomolecular simulation application on Cray XK6 , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[10]  William J. Dally,et al.  GPUs and the Future of Parallel Computing , 2011, IEEE Micro.

[11]  Laxmikant V. Kalé,et al.  A distributed dynamic load balancer for iterative applications , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[12]  M. D. Giles,et al.  Process Technology Variation , 2011, IEEE Transactions on Electron Devices.

[13]  Martin Schulz,et al.  Exploring hardware overprovisioning in power-constrained, high performance computing , 2013, ICS '13.

[14]  Benoît Meister,et al.  Runnemede: An architecture for Ubiquitous High-Performance Computing , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[15]  John L. Gustafson,et al.  Reevaluating Amdahl's law , 1988, CACM.

[16]  Josep Torrellas,et al.  EnergySmart: Toward energy-efficient manycores for Near-Threshold Computing , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[17]  Josep Torrellas Extreme-scale computer architecture: Energy efficiency from the ground up‡ , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[18]  Christine A. Shoemaker,et al.  Scalable thread scheduling and global power management for heterogeneous many-core architectures , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[19]  Lieven Eeckhout,et al.  Scheduling heterogeneous multi-cores through performance impact estimation (PIE) , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[20]  Lieven Eeckhout,et al.  Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[21]  Christina Freytag,et al.  Using Mpi Portable Parallel Programming With The Message Passing Interface , 2016 .

[22]  Allen B. Downey,et al.  A parallel workload model and its implications for processor allocation , 1996, Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183).

[23]  Abhishek Gupta,et al.  Parallel Programming with Migratable Objects: Charm++ in Practice , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[24]  Bil Lewis,et al.  Multithreaded Programming With PThreads , 1997 .

[25]  Josep Torrellas,et al.  Variation-Aware Application Scheduling and Power Management for Chip Multiprocessors , 2008, 2008 International Symposium on Computer Architecture.

[26]  Josep Torrellas,et al.  VARIUS-NTV: A microarchitectural model to capture the increased sensitivity of manycores to process variations at near-threshold voltages , 2012, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012).