Mapping Computations in Heterogeneous Multicore Systems with Statistical Regression on Inputs

Heterogeneous multicore systems, such as ARM big. LITTLE, use different types of processors to conciliate high performance with low energy consumption. A question that concerns such systems is how to find the best hardware configuration (type and frequency of processors) for a program. Current solutions are either completely dynamic, based on online profiling, or completely static, based on supervised machine learning. Whereas the former approach can bring unwanted runtime overhead, the latter fails to account for diversity in program inputs. In this paper, we design and evaluate a compilation strategy, Jinn-c, that perform statistical regression on function arguments, so as to match parameters with ideal hardware configurations at runtime. We show that Jinn-c, implemented in the Soot compiler, can predict the best configuration for a suite of Java and Scala programs running on an Odroid XU4 board, while outperforming prior techniques such as ARM's GTS and CHOAMP, a recently released static program scheduler.

[1]  Lieven Eeckhout,et al.  Scheduling heterogeneous multi-cores through performance impact estimation (PIE) , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[2]  Omer Khan,et al.  A self-adaptive scheduler for asymmetric multi-cores , 2010, GLSVLSI '10.

[3]  Fernando Magno Quintão Pereira,et al.  Scheduling in Heterogeneous Architectures via Multivariate Linear Regression on Function Inputs , 2019 .

[4]  Rupesh Nasre,et al.  Optimizing Graph Algorithms in Asymmetric Multicore Processors , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[5]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[6]  Michael Frank,et al.  A Compiler-Centric Infra-Structure for Whole-Board Energy Measurement on Heterogeneous Android Systems , 2018, 2018 13th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC).

[7]  Arthur Charguéraud,et al.  Heartbeat scheduling: provable efficiency for nested parallelism , 2018, PLDI.

[8]  Laurent Lefèvre,et al.  A survey on techniques for improving the energy efficiency of large-scale distributed systems , 2014, ACM Comput. Surv..

[9]  Zheng Wang,et al.  Machine Learning in Compiler Optimization , 2018, Proceedings of the IEEE.

[10]  Andrea Rosà,et al.  Renaissance: benchmarking suite for parallel applications on the JVM , 2019, PLDI.

[11]  Michael L. Scott,et al.  Energy-efficient processor design using multiple clock domains with dynamic voltage and frequency scaling , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[12]  Adrian Garcia-Garcia,et al.  Contention-Aware Fair Scheduling for Asymmetric Single-ISA Multicore Systems , 2018, IEEE Transactions on Computers.

[13]  Guy E. Blelloch,et al.  Brief announcement: the problem based benchmark suite , 2012, SPAA '12.

[14]  Paul M. Carpenter,et al.  Hipster: Hybrid Task Manager for Latency-Critical Cloud Workloads , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[15]  Julia L. Lawall,et al.  Continuously measuring critical section pressure with the free-lunch profiler , 2014, OOPSLA.

[16]  Laurie Hendren,et al.  Soot: a Java bytecode optimization framework , 2010, CASCON.

[17]  Zhenhua Duan,et al.  Efficient and scalable scheduling for performance heterogeneous multicore systems , 2012, J. Parallel Distributed Comput..

[18]  Per Stenström,et al.  QoS-Driven Coordinated Management of Resources to Save Energy in Multi-core Systems , 2019, 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[19]  Hermann Härtig,et al.  Heterogeneity by the Numbers: A Study of the ODROID XU+E big.LITTLE Platform , 2014, HotPower.

[20]  Sparsh Mittal,et al.  A Survey of Techniques for Architecting and Managing Asymmetric Multicore Processors , 2016, ACM Comput. Surv..

[21]  Shankar Balachandran,et al.  $\mathsf{CHOAMP}$ : Cost Based Hardware Optimization for Asymmetric Multicore Processors , 2018, IEEE Transactions on Multi-Scale Computing Systems.

[22]  Ananta Tiwari,et al.  Compute bottlenecks on the new 64-bit ARM , 2015, E2SC '15.

[23]  Rami G. Melhem,et al.  Energy-Efficient Thread Assignment Optimization for Heterogeneous Multicore Systems , 2015, ACM Trans. Embed. Comput. Syst..

[24]  Stacey Jeffery,et al.  HASS: a scheduler for heterogeneous multicore systems , 2009, OPSR.

[25]  Michael D. Smith,et al.  Voltage Smoothing: Characterizing and Mitigating Voltage Noise in Production Processors via Software-Guided Thread Scheduling , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[26]  Jie Yao,et al.  Montgolfier: Latency-aware power management system for heterogeneous servers , 2016, 2016 IEEE 35th International Performance Computing and Communications Conference (IPCCC).

[27]  Gianluca Palermo,et al.  A Survey on Compiler Autotuning using Machine Learning , 2018, ACM Comput. Surv..

[28]  Fernando Magno Quintão Pereira,et al.  JetsonLEAP: a Framework to Measure Power on a Heterogeneous System-on-a-Chip Device , 2017, Sci. Comput. Program..

[29]  Per Stenström,et al.  SaC: Exploiting Execution-Time Slack to Save Energy in Heterogeneous Multicore Systems , 2019, ICPP.