Adaptive workload-aware task scheduling for single-ISA asymmetric multicore architectures

Single-ISA Asymmetric Multicore (AMC) architectures have shown high performance as well as power efficiency. However, current parallel programming environments do not perform well on AMC because they are designed for symmetric multicore architectures in which all cores provide equal performance. Their random task scheduling policies can result in unbalanced workloads in AMC and severely degrade the performance of parallel applications. To balance the workloads of parallel applications in AMC, this article proposes an adaptive Workload-Aware Task Scheduler (WATS) that consists of a history-based task allocator and a preference-based task scheduler. The history-based task allocator is based on a near-optimal, static task allocation using the historical statistics collected during the execution of a parallel application. The preference-based task scheduler, which schedules tasks based on a preference list, can dynamically adjust the workloads in AMC if the task allocation is less optimal due to approximation in the history-based task allocator. Experimental results show that WATS can improve both the performance and energy efficiency of task-based applications, with the performance gain up to 66.1% compared with traditional task schedulers.

[1]  Lieven Eeckhout,et al.  Scheduling heterogeneous multi-cores through performance impact estimation (PIE) , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[2]  Soraya Ghiasi,et al.  Scheduling for heterogeneous processors in server systems , 2005, CF '05.

[3]  Quan Chen,et al.  WATS: Workload-Aware Task Scheduling in Asymmetric Multi-core Architectures , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[4]  Shunfei Chen,et al.  MARSS: A full system simulator for multicore x86 CPUs , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).

[5]  James Reinders,et al.  Intel® threading building blocks , 2008 .

[6]  Luís Nogueira,et al.  Scheduling parallel real-time tasks using a fixed-priority work-stealing algorithm on multiprocessors , 2013, 2013 8th IEEE International Symposium on Industrial Embedded Systems (SIES).

[7]  Quan Chen,et al.  Adaptive Cache Aware Bitier Work-Stealing in Multisocket Multicore Architectures , 2013, IEEE Transactions on Parallel and Distributed Systems.

[8]  Yi Guo,et al.  Work-first and help-first scheduling policies for async-finish task parallelism , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[9]  Michael A. Bender,et al.  Scheduling Cilk multithreaded parallel programs on processors of different speeds , 2000, SPAA.

[10]  Dheeraj Reddy,et al.  Bias scheduling in heterogeneous multi-core architectures , 2010, EuroSys '10.

[11]  Tobias Schüle,et al.  Work Stealing Strategies for Parallel Stream Processing in Soft Real-Time Systems , 2012, ARCS.

[12]  Bradley C. Kuszmaul,et al.  Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.

[13]  Onur Mutlu,et al.  Bottleneck identification and scheduling in multithreaded applications , 2012, ASPLOS XVII.

[14]  Quan Chen,et al.  HAT: history-based auto-tuning MapReduce in heterogeneous environments , 2013, The Journal of Supercomputing.

[15]  Mark D. Hill,et al.  Amdahl's Law in the Multicore Era , 2008, Computer.

[16]  GuoMinyi,et al.  Adaptive workload-aware task scheduling for single-ISA asymmetric multicore architectures , 2014 .

[17]  Quan Chen,et al.  CAB: Cache Aware Bi-tier Task-Stealing in Multi-socket Multi-core Architecture , 2011, 2011 International Conference on Parallel Processing.

[18]  Manuel Prieto,et al.  Operating system support for mitigating software scalability bottlenecks on asymmetric multicore processors , 2010, CF '10.

[19]  Quan Chen,et al.  CATS: cache aware task-stealing based on online profiling in multi-socket multi-core architectures , 2012, ICS '12.

[20]  Ravi Rajwar,et al.  The impact of performance asymmetry in emerging multicore architectures , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[21]  Onur Mutlu,et al.  Accelerating critical section execution with asymmetric multi-core architectures , 2009, ASPLOS.

[22]  Dean M. Tullsen,et al.  Exploiting unbalanced thread scheduling for energy and performance on a CMP of SMT processors , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[23]  Jens Palsberg,et al.  Featherweight X10: a core calculus for async-finish parallelism , 2010, PPoPP '10.

[24]  Alejandro Duran,et al.  The Design of OpenMP Tasks , 2009, IEEE Transactions on Parallel and Distributed Systems.

[25]  Rafael Asenjo,et al.  Load balancing using work-stealing for pipeline parallelism in emerging applications , 2009, ICS.

[26]  Tong Li,et al.  Efficient operating system scheduling for performance-asymmetric multi-core architectures , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[27]  Norman P. Jouppi,et al.  Heterogeneous chip multiprocessors , 2005, Computer.

[28]  Manuel Prieto,et al.  A comprehensive scheduler for asymmetric multicore systems , 2010, EuroSys '10.

[29]  Long Zheng,et al.  Architecture-based Performance Evaluation of Genetic Algorithms on Multi/Many-core Systems , 2011, 2011 14th IEEE International Conference on Computational Science and Engineering.

[30]  Stacey Jeffery,et al.  HASS: a scheduler for heterogeneous multicore systems , 2009, OPSR.

[31]  Hyesoon Kim,et al.  Age based scheduling for asymmetric multiprocessors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[32]  Serge Miguet,et al.  Heuristics for 1D Rectilinear Partitioning as a Low Cost and High Quality Answer to Dynamic Load Balancing , 1997, HPCN Europe.

[33]  Jane W.-S. Liu,et al.  Bounds on Scheduling Algorithms for Heterogeneous Comnputing Systems , 1974, IFIP Congress.

[34]  Arnold L. Rosenberg,et al.  Toward understanding heterogeneity in computing , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[35]  Norman P. Jouppi,et al.  Single-ISA heterogeneous multi-core architectures for multithreaded workload performance , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[36]  Yi Guo,et al.  SLAW: A scalable locality-aware adaptive work-stealing scheduler , 2010, IPDPS.

[37]  Matteo Frigo,et al.  The implementation of the Cilk-5 multithreaded language , 1998, PLDI.

[38]  David B. Shmoys,et al.  A Polynomial Approximation Scheme for Scheduling on Uniform Processors: Using the Dual Approximation Approach , 1988, SIAM J. Comput..

[39]  Sally A. McKee,et al.  An approach to resource-aware co-scheduling for CMPs , 2010, ICS '10.

[40]  Steven A. Hofmeyr,et al.  Load balancing on speed , 2010, PPoPP '10.

[41]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[42]  Sandhya Dwarkadas,et al.  Compatible phase co-scheduling on a CMP of multi-threaded processors , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.