Using stream rewriting for mapping and scheduling data flow graphs onto many-core architectures

Dataflow graphs, consisting of concurrent actors connected by communication channels, are widely used to model multimedia applications. As dataflow graphs explicitly expose the parallelism contained in the application, they yield well to synthesis for many-core architectures. However, in case of varying and unpredictable workloads, a static mapping of actors to computing resources is often infeasible, but a dynamic approach becomes challenging due to the numerous amount of actors. Our concept of stream-rewriting represents a novel execution semantics for dataflow graphs on many-core architectures, which allows for a completely dynamic binding of actors instances to processing units. In addition, we present a distributed scheduling mechanism, global resource sharing and lightweight lock-free synchronization based on pattern matching. Also, an optimized architecture for stream-rewriting is prototyped and evaluated.

[1]  Christian Haubelt,et al.  A Programmable Graphics Processor based on Partial Stream Rewriting , 2013, Comput. Graph. Forum.

[2]  P. P. Chakrabarti,et al.  Online Scheduling of Dynamic Task Graphs with Communication and Contention for Multiprocessors , 2012, IEEE Transactions on Parallel and Distributed Systems.

[3]  Salim Hariri,et al.  Task scheduling algorithms for heterogeneous processors , 1999, Proceedings. Eighth Heterogeneous Computing Workshop (HCW'99).

[4]  Christian Haubelt,et al.  Hardware synthesis of recursive functions through partial stream rewriting , 2012, DAC Design Automation Conference 2012.

[5]  Jürgen Becker,et al.  Hardware prototyping of novel invasive multicore architectures , 2012, 17th Asia and South Pacific Design Automation Conference.

[6]  Klaus Schneider,et al.  Out-Of-order execution of synchronous data-flow networks , 2012, 2012 International Conference on Embedded Computer Systems (SAMOS).

[7]  Evripidis Bampis,et al.  Scheduling UET-UCT Series-Parallel Graphs on Two Processors , 1996, Theor. Comput. Sci..

[8]  Eugene L. Lawler,et al.  The recognition of Series Parallel digraphs , 1979, SIAM J. Comput..

[9]  Radu Marculescu,et al.  User-Aware Dynamic Task Allocation in Networks-on-Chip , 2008, 2008 Design, Automation and Test in Europe.

[10]  Margarida F. Jacome,et al.  Compiler-directed ILP extraction for clustered VLIW/EPIC machines: predication, speculation and modulo scheduling , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.

[11]  Xin Zhao,et al.  An ILP formulation for task mapping and scheduling on multi-core architectures , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[12]  Edward A. Lee,et al.  Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing , 1989, IEEE Transactions on Computers.

[13]  Christian Haubelt,et al.  Dynamic task mapping onto multi-core architectures through stream rewriting , 2013, 2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS).

[14]  Soonhoi Ha,et al.  Pipelined data parallel task mapping/scheduling technique for MPSoC , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[15]  Shuvra S. Bhattacharyya,et al.  A generalized scheduling approach for dynamic dataflow applications , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[16]  C. Leiserson,et al.  Executing Dynamic Task Graphs Using Work-Stealing , 2010 .

[17]  Christian Haubelt,et al.  A rule-based static dataflow clustering algorithm for efficient embedded software synthesis , 2011, 2011 Design, Automation & Test in Europe.

[18]  Jörg Henkel,et al.  Invasive manycore architectures , 2012, 17th Asia and South Pacific Design Automation Conference.

[19]  H. Ali,et al.  Task Scheduling in Multiprocessing Systems , 1995, Computer.

[20]  Andrew W. Appel,et al.  Continuation-passing, closure-passing style , 1989, POPL '89.