COLAB: a collaborative multi-factor scheduler for asymmetric multicore processors

Increasingly prevalent asymmetric multicore processors (AMP) are necessary for delivering performance in the era of limited power budget and dark silicon. However, the software fails to use them efficiently. OS schedulers, in particular, handle asymmetry only under restricted scenarios. We have efficient symmetric schedulers, efficient asymmetric schedulers for single-threaded workloads, and efficient asymmetric schedulers for single program workloads. What we do not have is a scheduler that can handle all runtime factors affecting AMP for multi-threaded multi-programmed workloads. This paper introduces the first general purpose asymmetry-aware scheduler for multi-threaded multi-programmed workloads. It estimates the performance of each thread on each type of core and identifies communication patterns and bottleneck threads. The scheduler then makes coordinated core assignment and thread selection decisions that still provide each application its fair share of the processor's time. We evaluate our approach using the GEM5 simulator on four distinct big.LITTLE configurations and 26 mixed workloads composed of PARSEC and SPLASH2 benchmarks. Compared to the state-of-the art Linux CFS and AMP-aware schedulers, we demonstrate performance gains of up to 25% and 5% to 15% on average depending on the hardware setup.

[1]  Ting Cao,et al.  The Yin and Yang of power and performance for asymmetric hardware and managed software , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[2]  Jaehyuk Huh,et al.  Fairness-oriented OS Scheduling Support for Multicore Systems , 2016, ICS.

[3]  Bjoern Franke,et al.  Measuring QoE of interactive workloads and characterising frequency governors on mobile devices , 2014, 2014 IEEE International Symposium on Workload Characterization (IISWC).

[4]  Xiaodong Wang,et al.  ReBudget: Trading Off Efficiency vs. Fairness in Market-Based Multicore Resource Allocation via Runtime Budget Reassignment , 2016, ASPLOS.

[5]  Christian Bienia,et al.  Benchmarking modern multiprocessors , 2011 .

[6]  Laurence T. Yang,et al.  Multicore Mixed-Criticality Systems: Partitioned Scheduling and Utilization Bound , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[7]  Norman P. Jouppi,et al.  Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction , 2003, MICRO.

[8]  Onur Mutlu,et al.  Accelerating critical section execution with asymmetric multi-core architectures , 2009, ASPLOS.

[9]  Jaehyuk Huh,et al.  Exploring the Design Space of Fair Scheduling Supports for Asymmetric Multicore Systems , 2018, IEEE Transactions on Computers.

[10]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[11]  Onur Mutlu,et al.  Utility-based acceleration of multithreaded applications on asymmetric CMPs , 2013, ISCA.

[12]  Tong Li,et al.  Efficient operating system scheduling for performance-asymmetric multi-core architectures , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[13]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[14]  Norman P. Jouppi,et al.  Single-ISA heterogeneous multi-core architectures: the potential for processor power reduction , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[15]  Ting Cao,et al.  Portable performance on Asymmetric Multicore Processors , 2016, 2016 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[16]  Patrick Crowley,et al.  Dynamic thread assignment on heterogeneous multiprocessor architectures , 2006, CF '06.

[17]  Sparsh Mittal,et al.  A Survey of Techniques for Architecting and Managing Asymmetric Multicore Processors , 2016, ACM Comput. Surv..

[18]  Jose Renau,et al.  Analysis of PARSEC workload scalability , 2016, 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[19]  Tong Li,et al.  Efficient and scalable multiprocessor fair scheduling using distributed weighted round-robin , 2009, PPoPP '09.

[20]  Stijn Eyerman,et al.  Criticality stacks: identifying critical threads in parallel programs using synchronization behavior , 2013, ISCA.

[21]  Norman P. Jouppi,et al.  Single-ISA heterogeneous multi-core architectures for multithreaded workload performance , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[22]  Lieven Eeckhout,et al.  Scheduling heterogeneous multi-cores through performance impact estimation (PIE) , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[23]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[24]  Manuel Prieto,et al.  Leveraging Core Specialization via OS Scheduling to Improve Performance on Asymmetric Multicore Systems , 2012, TOCS.

[25]  Benjamin C. Lee,et al.  Amdahl's Law in the Datacenter Era: A Market for Fair Processor Allocation , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[26]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[27]  Eduard Ayguadé,et al.  Task Scheduling Techniques for Asymmetric Multi-Core Systems , 2017, IEEE Transactions on Parallel and Distributed Systems.

[28]  Lieven Eeckhout,et al.  Fairness-aware scheduling on single-ISA heterogeneous multi-cores , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.

[29]  Stijn Eyerman,et al.  System-Level Performance Metrics for Multiprogram Workloads , 2008, IEEE Micro.

[30]  Onur Mutlu,et al.  Bottleneck identification and scheduling in multithreaded applications , 2012, ASPLOS XVII.