Compiler-assisted Adaptive Program Scheduling in big.LITTLE Systems

Energy-aware architectures provide applications with a mix of low and high frequency cores. Selecting the best core configurations for running programs is very challenging. Here, we leverage compilation, runtime monitoring and machine learning to map program phases to their best matching configurations. As a proof-of-concept, we devise the Astro system to show that our approach can outperform a state-of-the-art Linux scheduler for heterogeneous architectures.

[1]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[2]  Francky Catthoor,et al.  Polyhedral parallel code generation for CUDA , 2013, TACO.

[3]  Cédric Augonnet,et al.  StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..

[4]  Henry Hoffmann,et al.  Maximizing Performance Under a Power Cap: A Comparison of Hardware, Software, and Hybrid Techniques , 2016, ASPLOS.

[5]  Joe D. Warren,et al.  The program dependence graph and its use in optimization , 1984, TOPL.

[6]  Daniel Mossé,et al.  Octopus-Man: QoS-driven task management for heterogeneous multicores in warehouse-scale computers , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[7]  Kevin Skadron,et al.  Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[8]  Michael F. P. O'Boyle,et al.  A Static Task Partitioning Approach for Heterogeneous Systems Using OpenCL , 2011, CC.

[9]  Lingjia Tang,et al.  Continuous shape shifting: Enabling loop co-optimization via near-free dynamic code rewriting , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[10]  Michael F. P. O'Boyle,et al.  Portable and transparent software managed scheduling on accelerators for fair resource sharing , 2016, 2016 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[11]  Scott A. Mahlke,et al.  Exploring Fine-Grained Heterogeneity with Composite Cores , 2016, IEEE Transactions on Computers.

[12]  Lieven Eeckhout,et al.  Scheduling heterogeneous multi-cores through performance impact estimation (PIE) , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[13]  Anuj Pathania,et al.  Price theory based power management for heterogeneous multi-cores , 2014, ASPLOS.

[14]  Fernando Magno Quintão Pereira,et al.  Compiler support for selective page migration in NUMA architectures , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[15]  Gu-Yeon Wei,et al.  Thread motion: fine-grained power management for multi-core systems , 2009, ISCA '09.

[16]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[17]  Manish Gupta,et al.  Power-Aware Microarchitecture: Design and Modeling Challenges for Next-Generation Microprocessors , 2000, IEEE Micro.

[18]  Jason Cong,et al.  Energy-efficient scheduling on heterogeneous multi-core architectures , 2012, ISLPED '12.

[19]  Gernot Heiser,et al.  Dynamic voltage and frequency scaling: the laws of diminishing returns , 2010 .

[20]  Geoff V. Merrett,et al.  Dataset supporting the article entitled "Accurate and Stable Run-Time Power Modeling for Mobile and Embedded CPUs" , 2016 .

[21]  Paul M. Carpenter,et al.  Hipster: Hybrid Task Manager for Latency-Critical Cloud Workloads , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[22]  Fernando Magno Quintão Pereira,et al.  DawnCC , 2017, ACM Trans. Archit. Code Optim..

[23]  Onur Mutlu,et al.  Bottleneck identification and scheduling in multithreaded applications , 2012, ASPLOS XVII.

[24]  Jean-Philippe Martin,et al.  Dandelion: a compiler and runtime for heterogeneous systems , 2013, SOSP.

[25]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[26]  Wei Wang,et al.  ReQoS: reactive static/dynamic compilation for QoS in warehouse scale computers , 2013, ASPLOS '13.

[27]  Gabriel Poesia,et al.  Static placement of computation on heterogeneous devices , 2017, Proc. ACM Program. Lang..

[28]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[29]  Cédric Augonnet,et al.  StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..

[30]  Rajkishore Barik,et al.  A black-box approach to energy-aware scheduling on integrated CPU-GPU systems , 2016, 2016 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[31]  Michael Frank,et al.  JetsonLeap: A Framework to Measure Energy-Aware Code Optimizations in Embedded and Heterogeneous Systems , 2016, SBLP.

[32]  Pedro Tomás,et al.  A Framework for Application-Guided Task Management on Heterogeneous Embedded Systems , 2015, ACM Trans. Archit. Code Optim..

[33]  Hadi Esmaeilzadeh,et al.  Neural acceleration for GPU throughput processors , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[34]  Hyesoon Kim,et al.  Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).