Contention-Aware Scheduling for Asymmetric Multicore Processors

Asymmetric multicore processors (AMPs) have been proposed as an energy-efficient alternative to symmetric mul-ticore processors (SMPs). However, AMPs derive their performance from core specialization, which requires co-running applications to be scheduled to run on their most appropriate core types. Despite extensive research on AMP scheduling, developing an effective scheduling algorithm remains challenging. Contention for shared resources is a key performance-limiting factor, which often renders existing contention-free scheduling algorithms ineffective. We introduce a contention-aware scheduling algorithm for ARM's big.LITTLE, a commercial AMP platform. Our algorithm comprises an offline stage and an online stage. The offline stage builds a performance interference model for an application by training it with a set of co-running applications. Guided by this model, the online stage schedules a workload by assigning its applications to their most appropriate core types in order to minimize the performance degradation caused by contention for shared resources. Our model can accurately predict the performance degradation of an application when co-running with other applications with an average prediction error of 9.60%. Compared with the default scheduler provided for ARM's big.LITTLE and the speedup-factor-driven scheduler, our contention-aware scheduler can improve overall system performance by up to 28.32% and 28.51%, respectively.

[1]  Lieven Eeckhout,et al.  Fairness-aware scheduling on single-ISA heterogeneous multi-cores , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.

[2]  Lieven Eeckhout,et al.  Scheduling heterogeneous multi-cores through performance impact estimation (PIE) , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[3]  Xiaobing Feng,et al.  Predicting Cross-Core Performance Interference on Multicore Processors with Regression Analysis , 2016, IEEE Transactions on Parallel and Distributed Systems.

[4]  David Eklov,et al.  Bandwidth Bandit: Quantitative characterization of memory contention , 2012, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[5]  Soraya Ghiasi,et al.  Scheduling for heterogeneous processors in server systems , 2005, CF '05.

[6]  Onur Mutlu,et al.  Memory Performance Attacks: Denial of Memory Service in Multi-Core Systems , 2007, USENIX Security Symposium.

[7]  Vanchinathan Venkataramani,et al.  Power-performance modeling on asymmetric multi-cores , 2013, 2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES).

[8]  Lizy Kurian John,et al.  Efficient program scheduling for heterogeneous multi-core processors , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[9]  Zhao Zhang,et al.  Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[10]  Norman P. Jouppi,et al.  Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction , 2003, MICRO.

[11]  Kevin Skadron,et al.  Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[12]  Tong Li,et al.  Efficient operating system scheduling for performance-asymmetric multi-core architectures , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[13]  Martin Schulz,et al.  Enabling fair pricing on HPC systems with node sharing , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[14]  Onur Mutlu,et al.  Coordinated control of multiple prefetchers in multi-core systems , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[15]  Stacey Jeffery,et al.  HASS: a scheduler for heterogeneous multicore systems , 2009, OPSR.

[16]  Hyesoon Kim,et al.  Age based scheduling for asymmetric multiprocessors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[17]  Hridesh Rajan,et al.  Phase-based tuning for better utilization of performance-asymmetric multicore processors , 2011, International Symposium on Code Generation and Optimization (CGO 2011).

[18]  Dheeraj Reddy,et al.  Bias scheduling in heterogeneous multi-core architectures , 2010, EuroSys '10.

[19]  Bruce R. Childers,et al.  Performance Modeling of Multithreaded Programs for Mobile Asymmetric Chip Multiprocessors , 2015, 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems.

[20]  Yale N. Patt,et al.  Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[21]  Norman P. Jouppi,et al.  Single-ISA heterogeneous multi-core architectures: the potential for processor power reduction , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[22]  Michael Stumm,et al.  RapidMRC: approximating L2 miss rate curves on commodity systems for online optimizations , 2009, ASPLOS.

[23]  Patrick Crowley,et al.  Dynamic thread assignment on heterogeneous multiprocessor architectures , 2006, CF '06.

[24]  Onur Mutlu,et al.  A Case for MLP-Aware Cache Replacement , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[25]  Xiaobing Feng,et al.  An empirical model for predicting cross-core performance interference on multicore processors , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.

[26]  Alexandra Fedorova,et al.  Addressing shared resource contention in multicore processors via scheduling , 2010, ASPLOS XV.

[27]  Sangyeun Cho,et al.  Managing Distributed, Shared L2 Caches through OS-Level Page Allocation , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[28]  Norman P. Jouppi,et al.  Single-ISA heterogeneous multi-core architectures for multithreaded workload performance , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[29]  Manuel Prieto,et al.  A comprehensive scheduler for asymmetric multicore systems , 2010, EuroSys '10.

[30]  Manuel Prieto,et al.  Survey of scheduling techniques for addressing shared resources in multicore processors , 2012, CSUR.

[31]  Chita R. Das,et al.  Application-aware prioritization mechanisms for on-chip networks , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).