Co-Optimizing Core Allocation, Mapping and DVFS in Streaming Programs with Moldable Tasks for Energy Efficient Execution on Manycore Architectures

Stream programming abstracts parallelism complexity by modeling a program as a set of streaming tasks. Tasks run repeatedly and can even be internally parallel, i.e., use one or multiple cores simultaneously (moldable). The throughput of the streaming application, as well as its energy consumption, depends strongly on scheduling, i.e., on how tasks are mapped to cores, and on the frequency at which they run. Crown scheduling is a scheduling method that reduces this problem's combinatorial complexity considerably by introducing a few additional restrictions especially on tasks' core allocation sizes and mapping. While it has previously been shown to outperform competing methods, the impact of these restrictions on the schedule quality has, up to now, never been analyzed quantitatively. In this paper, we first propose several crown scheduler improvements toward fewer restrictions. Also, we provide an Integer Linear Programming formulation that solves the same optimization problem without the inherent restrictions of crown scheduling. While in an extreme case an unrestricted schedule might use 3.7 times less energy than a crown schedule for a realistic execution platform model, we show that in practical benchmarks the difference is small while crown schedulers are significantly faster than unrestricted scheduling. We experimentally confirm this with benchmarks derived from random task collections, classic parallel algorithms as well as the Streamit benchmark suite.

[1]  Simon Holmbacka,et al.  Accurate energy modeling for many-core static schedules with streaming applications , 2016, Microprocess. Microsystems.

[2]  Domenik Helms,et al.  Leakage in CMOS Circuits - An Introduction , 2004, PATMOS.

[3]  Nicolas Melot,et al.  Algorithms and Framework for Energy Efficient Parallel Stream Computing on Many-Core Architectures , 2016 .

[4]  Christoph W. Kessler,et al.  Fast Crown Scheduling Heuristics for Energy-Efficient Mapping and Scaling of Moldable Streaming Tasks on Manycore Systems , 2015, ACM Trans. Archit. Code Optim..

[5]  Michael I. Gordon,et al.  Exploiting coarse-grained task, data, and pipeline parallelism in stream programs , 2006, ASPLOS XII.

[6]  Christoph W. Kessler,et al.  Mimer and Schedeval: Tools for Comparing Static Schedulers for Streaming Applications on Manycore Architectures , 2015, 2015 44th International Conference on Parallel Processing Workshops.

[7]  Fanxin Kong,et al.  Energy Minimizing for Parallel Real-Time Tasks Based on Level-Packing , 2012, 2012 IEEE International Conference on Embedded and Real-Time Computing Systems and Applications.

[8]  Jun Liu,et al.  Voltage Island Aware Energy Efficient Scheduling of Real-Time Tasks on Multi-core Processors , 2014, 2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS).

[9]  Rami G. Melhem,et al.  Energy-efficient policies for embedded clusters , 2005, LCTES '05.

[10]  Gilles Kahn,et al.  The Semantics of a Simple Language for Parallel Programming , 1974, IFIP Congress.

[11]  Zhiyong Liu,et al.  An effective approximation algorithm for the Malleable Parallel Task Scheduling problem , 2012, J. Parallel Distributed Comput..

[12]  Brian W. Kernighan,et al.  AMPL: A Modeling Language for Mathematical Programming , 1993 .

[13]  Anantha P. Chandrakasan,et al.  Minimizing power consumption in digital CMOS circuits , 1995, Proc. IEEE.

[14]  Kirk Pruhs,et al.  Speed Scaling of Tasks with Precedence Constraints , 2005, Theory of Computing Systems.

[15]  Christoph W. Kessler,et al.  Improving Energy-Efficiency of Static Schedules by Core Consolidation and Switching Off Unused Cores , 2015, PARCO.

[16]  Christoph W. Kessler,et al.  Fast Crown Scheduling Heuristics for Energy-Efficient Mapping and Scaling of Moldable Streaming Tasks on Many-Core Systems , 2015, SCOPES.