Crown scheduling: Energy-efficient resource allocation, mapping and discrete frequency scaling for collections of malleable streaming tasks

We investigate the problem of generating energy-optimal code for a collection of streaming tasks that include parallelizable or malleable tasks on a generic many-core processor with dynamic discrete frequency scaling. Streaming task collections differ from classical task sets in that all tasks are running concurrently, so that cores typically run several tasks that are scheduled round-robin at user level in a data driven way. A stream of data flows through the tasks and intermediate results are forwarded to other tasks like in a pipelined task graph. In this paper we present crown scheduling, a novel technique for the combined optimization of resource allocation, mapping and discrete voltage/frequency scaling for malleable streaming task sets in order to optimize energy efficiency given a throughput constraint. We present optimal off-line algorithms for separate and integrated crown scheduling based on integer linear programming (ILP). We also propose extensions for dynamic rescaling to automatically adapt a given crown schedule in situations where not all tasks are data ready. Our energy model considers both static idle power and dynamic power consumption of the processor cores. Our experimental evaluation of the ILP models for a generic manycore architecture shows that at least for small and medium sized task sets even the integrated variant of crown scheduling can be solved to optimality by a state-of-the-art ILP solver within a few seconds.

[1]  Keqin Li,et al.  Energy efficient scheduling of parallel tasks on multiprocessor computers , 2012, The Journal of Supercomputing.

[2]  Jörg Keller,et al.  Energy-efficient Mapping of Task Collections onto Manycore Processors , 2013, HiPEAC 2013.

[3]  Soheil Ghiasi,et al.  FORMLESS: scalable utilization of embedded manycores in streaming applications , 2012, LCTES.

[4]  Peter Sanders,et al.  Energy Efficient Frequency Scaling and Scheduling for Malleable Tasks , 2012, Euro-Par.

[5]  Nadia Nedjah,et al.  Customized computer-aided application mapping on NoC infrastructure using multi-objective optimization , 2011, J. Syst. Archit..

[6]  Teodor Gabriel Crainic,et al.  Efficient Heuristics for the Variable Size Bin Packing Problem with Fixed Costs , 2010 .

[7]  Christoph Kessler,et al.  Parallel sorting on Intel Single-Chip Cloud computer , 2011 .

[8]  Kirk Pruhs,et al.  Speed Scaling of Processes with Arbitrary Speedup Curves on a Multiprocessor , 2009, SPAA '09.

[9]  Mehdi Serairi,et al.  Heuristics for the variable sized bin-packing problem , 2009, Comput. Oper. Res..

[10]  Christoph W. Kessler,et al.  Modelling Power Consumption of the Intel SCC , 2012, MARC Symposium.

[11]  Kirk Pruhs,et al.  Scalably scheduling processes with arbitrary speedup curves , 2009, TALG.

[12]  Christoph W. Kessler,et al.  Optimized On-Chip-Pipelining for Memory-Intensive Computations on Multi-Core Processors with Explicit Memory Hierarchy , 2012, J. Univers. Comput. Sci..

[13]  John Augustine,et al.  Strip packing with precedence constraints and strip packing with release times , 2009, Theor. Comput. Sci..