Scheduling of Parallelized Synchronous Dataflow Actors for Multicore Signal Processing

Parallelization of Digital Signal Processing (DSP) software is an important trend in Multiprocessor System-on-Chip (MPSoC) implementation. The performance of DSP systems composed of parallelized computations depends on the scheduling technique, which must in general allocate computation and communication resources for competing tasks, and ensure that data dependencies are satisfied. In this paper, we formulate a new type of parallel task scheduling problem called Parallel Actor Scheduling (PAS) for MPSoC mapping of DSP systems that are represented as Synchronous Dataflow (SDF) graphs. In contrast to traditional SDF-based scheduling techniques, which focus on exploiting graph level (inter-actor) parallelism, the PAS problem targets the integrated exploitation of both intra- and inter-actor parallelism for platforms in which individual actors can be parallelized across multiple processing units. We first address a special case of the PAS problem in which all of the actors in the DSP application or subsystem being optimized are parallel actors (i.e., they can be parallelized to exploit multiple cores). For this special case, we develop and experimentally evaluate a two-phase scheduling framework with three work flows that involve particle swarm optimization (PSO) — PSO with a mixed integer programming formulation, PSO with simulated annealing, and PSO with a fast heuristic based on list scheduling. Then, we extend our scheduling framework to support the general PAS problem, which considers both parallel actors and sequential actors (actors that cannot be parallelized) in an integrated manner. We demonstrate that our PAS-targeted scheduling framework provides a useful range of trade-offs between synthesis time requirements and the quality of the derived solutions. We also demonstrate the performance of our scheduling framework from two aspects: simulations on a diverse set of randomly generated SDF graphs, and implementations of an image processing application and a software defined radio benchmark on a state-of-the-art multicore DSP platform.

[1]  Keshab K. Parhi,et al.  Unfolding and retiming for high-level DSP synthesis , 1991, 1991., IEEE International Sympoisum on Circuits and Systems.

[2]  Shuvra S. Bhattacharyya,et al.  Embedded Multiprocessors: Scheduling and Synchronization , 2000 .

[3]  Gary J. Sullivan,et al.  Overview of the High Efficiency Video Coding (HEVC) Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[4]  Verdi March,et al.  Survey on Parallel Programming Model , 2008, NPC.

[5]  Pawel Gepner,et al.  Multi-Core Processors: New Way to Achieve High System Performance , 2006, PARELEC.

[6]  Joseph Y.-T. Leung,et al.  Complexity of Scheduling Parallel Task Systems , 1989, SIAM J. Discret. Math..

[7]  Giovanni De Micheli,et al.  Synthesis and Optimization of Digital Circuits , 1994 .

[8]  Shuvra S. Bhattacharyya,et al.  Vectorization and mapping of software defined radio applications on heterogeneous multi-processor platforms , 2011, 2011 IEEE Workshop on Signal Processing Systems (SiPS).

[9]  Evangeline F. Y. Young,et al.  Placement constraints in floorplan design , 2004, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[10]  Julius Surkis,et al.  Evaluation of a Heuristic for Scheduling Independent Jobs on Parallel Identical Processors , 1979 .

[11]  Edward A. Lee,et al.  Dataflow process networks , 2001 .

[12]  Ira Krepchin,et al.  Texas Instruments Inc. , 1963, Nature.

[13]  Christian Haubelt,et al.  A rule-based quasi-static scheduling approach for static islands in dynamic dataflow graphs , 2013, TECS.

[14]  Shuvra S. Bhattacharyya,et al.  Software synthesis from the dataflow interchange format , 2005, SCOPES '05.

[15]  Klaus Jansen,et al.  Preemptive Parallel Task Scheduling in O(n)+Poly(m) Time , 2000, ISAAC.

[16]  Jean-François Nezan,et al.  Scalable compile-time scheduler for multi-core architectures , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[17]  Hesham H. Ali,et al.  Task scheduling in parallel and distributed systems , 1994, Prentice Hall series in innovative technology.

[18]  Jean-François Nezan,et al.  Scheduling of parallelized synchronous dataflow actors , 2013, 2013 International Symposium on System on Chip (SoC).

[19]  Chengbin Chu,et al.  Scheduling multiprocessor tasks to minimise the makespan on two dedicated processors , 2010 .

[20]  George F. Zaki Scalable Techniques for Scheduling and Mapping DSP Applications onto Embedded Multiprocessor Platforms , 2013 .

[21]  Shenpei Wu Representation and scheduling of scalable dataflow graph topologies , 2011 .

[22]  E.A. Lee,et al.  Synchronous data flow , 1987, Proceedings of the IEEE.

[23]  James Kennedy,et al.  Particle swarm optimization , 2002, Proceedings of ICNN'95 - International Conference on Neural Networks.

[24]  Shuvra S. Bhattacharyya,et al.  A generalized scheduling approach for dynamic dataflow applications , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[25]  Yoji Kajitani,et al.  VLSI module placement based on rectangle-packing by the sequence-pair , 1996, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[26]  L. Dagum,et al.  OpenMP: an industry standard API for shared-memory programming , 1998 .

[27]  Fatma A. Omara,et al.  Genetic algorithms for task scheduling problem , 2010, J. Parallel Distributed Comput..

[28]  Stefania Sesia,et al.  LTE - The UMTS Long Term Evolution, Second Edition , 2011 .

[29]  Marek Kubale,et al.  A graph coloring approach to scheduling of multiprocessor tasks on dedicated machines with availability constraints , 2009, Discret. Appl. Math..

[30]  Majid Sarrafzadeh,et al.  An approximation algorithm for scheduling on heterogeneous reconfigurable resources , 2009, TECS.

[31]  Eric Blossom,et al.  GNU radio: tools for exploring the radio frequency spectrum , 2004 .

[32]  Bradford Nichols,et al.  Pthreads programming - a POSIX standard for better multiprocessing , 1996 .

[33]  Shuvra S. Bhattacharyya,et al.  Systematic integration of flowgraph- and module-level parallelism in implementation of DSP applications on multiprocessor systems-on-chip , 2012, 2012 IEEE 11th International Conference on Signal Processing.

[34]  Shuvra S. Bhattacharyya,et al.  A design tool for efficient mapping of multimedia applications onto heterogeneous platforms , 2011, 2011 IEEE International Conference on Multimedia and Expo.

[35]  Daniel Gajski,et al.  Hypertool: A Programming Aid for Message-Passing Systems , 1990, IEEE Trans. Parallel Distributed Syst..

[36]  Rob A. Rutenbar,et al.  Simulated annealing algorithms: an overview , 1989, IEEE Circuits and Devices Magazine.