Expressing Inter-task Dependencies between Parallel Stencil Operations

Complex embedded systems are designed under tight constraints on response time, resource usage and cost. Design space exploration tools help designers map and schedule embedded software to complex architectures such as heterogeneous MPSoC’s. Task graphs are coarse grained representations of parallel program behaviour which are used to evaluate the feasibility of a particular design. However, automatically extracting an accurate task graph from source code is challenging. This paper investigates how to describe data dependencies to aid tools based on program analysis in extracting task graphs from source code. We will examine a common parallel programming pattern – stencil operations – and show that even for such codes with a regular control flow, the precise dependencies between two stencil operations cannot always be determined at compile time. We introduce a language construct which i) captures an upper bound on the number of dependencies between successive stencil operations and ii) instructs the compiler to generate code which ensures that the bound holds for each execution of the program. The impact of our proposal is evaluated using a micro-benchmark and two soft real-time embedded image processing applications. The coding effort is low – at most one line of code per parallel loop was added. The performance impact is evaluated on a quad-core Linux workstation and we observe no statistically significant slowdown.

[1]  Robert P. Dick,et al.  Automatic run-time extraction of communication graphs from multithreaded applications , 2006, Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS '06).

[2]  Niraj K. Jha,et al.  Task graph extraction for embedded system synthesis , 2003, 16th International Conference on VLSI Design, 2003. Proceedings..

[3]  Soonhoi Ha Model-based Programming Environment of Embedded Software for MPSoC , 2007, 2007 Asia and South Pacific Design Automation Conference.

[4]  Timothy G. Mattson,et al.  Patterns for parallel programming , 2004 .

[5]  Per Larsen,et al.  Identifying Inter-task Communication in Shared Memory Programming Models , 2009, IWOMP.

[6]  Tao Yang,et al.  DSC: Scheduling Parallel Tasks on an Unbounded Number of Processors , 1994, IEEE Trans. Parallel Distributed Syst..

[7]  J. S. Jimmy Li,et al.  High Order Extrapolation Using Taylor Series for Color Filter Array Demosaicing , 2005, ICIAR.

[8]  Oliver Sinnen,et al.  Task Scheduling for Parallel Systems , 2007, Wiley series on parallel and distributed computing.

[9]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Jan Madsen,et al.  ARTS: A SystemC-based framework for multiprocessor Systems-on-Chip modelling , 2007, Des. Autom. Embed. Syst..

[11]  Petru Eles,et al.  System-Level Design Techniques for Energy-Efficient Embedded Systems , 2003, Springer US.

[12]  Ishfaq Ahmad,et al.  CASCH: a tool for computer-aided scheduling , 2000, IEEE Concurr..

[13]  Ishfaq Ahmad,et al.  On Parallelizing the Multiprocessor Scheduling Problem , 1999, IEEE Trans. Parallel Distributed Syst..

[14]  Michel Cosnard,et al.  Automatic task graph generation techniques , 1995, Proceedings of the Twenty-Eighth Annual Hawaii International Conference on System Sciences.

[15]  Pierre G. Paulin,et al.  MPSoC memory optimization for digital camera applications , 2007, 10th Euromicro Conference on Digital System Design Architectures, Methods and Tools (DSD 2007).

[16]  Stamatis Vassiliadis,et al.  Hartes Toolchain Early Evaluation: Profiling, Compilation and HDL Generation , 2007, 2007 International Conference on Field Programmable Logic and Applications.

[17]  Rizos Sakellariou,et al.  Compiler Synthesis of Task Graphs for Parallel Program Performance Prediction , 2000, LCPC.

[18]  Matthias Gries,et al.  Methods for evaluating and covering the design space during early design development , 2004, Integr..

[19]  Hesham El-Rewini,et al.  Scheduling Parallel Program Tasks onto Arbitrary Target Machines , 1990, J. Parallel Distributed Comput..

[20]  Vivek Sarkar,et al.  Partitioning and Scheduling Parallel Programs for Multiprocessing , 1989 .