A lightweight run-time scheduler for multitasking multicore stream applications

Stream programming models promise dramatic improvements in developers' ability to express parallelism in their applications while enabling extremely efficient implementations on modern many-core processors. Unfortunately, the wide variation in the architectural features of available multi-core processors implies that a single compiler may be incapable of generating general solutions which can run on many target systems, or even on different configurations of the same system. In particular, off-line approaches for finding optimal mappings and schedules for a stream program on a specific processor are limited by their lack of portability across different processors, and by a lack of flexibility for run time variations in resource availability in typical multi-tasking environments. The paper presents a scheme that includes a lightweight compile-time sequencer, and a dynamic scheduler capable of mapping stream programs onto available cores in a multi-core processor at run-time. Unlike previous implementations, our scheme requires limited knowledge of the target architecture's resources at compile time. The off-line portion of the scheme generates canonical scheduling information about the stream program. This information is utilized by the lightweight run-time scheduling algorithm to generate application mappings in linear time based on available resources giving near optimal throughput. Evaluations of schedules generated for twelve streaming benchmarks gives an average of 96% and 93% of the theoretical optimum throughput for schedules with up to 4 and 128 cores, respectively.

[1]  Scott A. Mahlke,et al.  Flextream: Adaptive Compilation of Streaming Applications for Heterogeneous Architectures , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.

[2]  David Zhang,et al.  A lightweight streaming layer for multicore execution , 2008, CARN.

[3]  Karam S. Chatha,et al.  Compilation of stream programs for multicore processors that incorporate scratchpad memories , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[4]  Scott A. Mahlke,et al.  Orchestrating the execution of stream programs on multicore platforms , 2008, PLDI '08.

[5]  M. Suzuoki,et al.  Overview of the architecture, circuit design, and physical implementation of a first-generation cell processor , 2006, IEEE Journal of Solid-State Circuits.

[6]  E.A. Lee,et al.  Synchronous data flow , 1987, Proceedings of the IEEE.

[7]  Michael I. Gordon,et al.  Exploiting coarse-grained task, data, and pipeline parallelism in stream programs , 2006, ASPLOS XII.

[8]  William Thies,et al.  StreamIt: A Language for Streaming Applications , 2002, CC.

[9]  Kevin Skadron,et al.  Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[10]  Pat Hanrahan,et al.  Brook for GPUs: stream computing on graphics hardware , 2004, ACM Trans. Graph..

[11]  Gerard J. M. Smit,et al.  Monotonicity and run-time scheduling , 2009, EMSOFT '09.

[12]  Alexandros Stamatakis,et al.  Runtime scheduling of dynamic parallelism on accelerator-based multi-core systems , 2007, Parallel Comput..

[13]  Naehyuck Chang,et al.  Low-power color TFT LCD display for hand-held embedded systems , 2002, ISLPED '02.

[14]  Axel Jantsch,et al.  Modeling embedded systems and SoCs - concurrency and time in models of computation , 2003, The Morgan Kaufmann series in systems on silicon.