Dynamic Piecewise Linear/Regular Algorithms

In this paper we present an extension of the class of piecewise linear algorithms (PLAs) in order to model one type of dynamic data dependencies. This extension significantly increases the range of applications which can be parallelized and mapped to massively parallel processor arrays. For instance, a lot of computational intensive applications for video and image processing consist of nested loop programs with only few and simple run-time dependent conditionals. Furthermore, we outline in which case these extensions can directly used - with slight changes - within traditional mapping methodologies based on loop parallelization in the polytope model. Additionally, we outline future research directions in the case existing methods will be inefficient.

[1]  Thomas Kailath,et al.  Regular iterative algorithms and their implementation on processor arrays , 1988, Proc. IEEE.

[2]  Lothar Thiele,et al.  Compiler Techniques for Massive Parallel Architectures , 1992 .

[3]  Jürgen Teich,et al.  Automatic Synthesis of FPGA Processor Arrays from Loop Algorithms , 2004, The Journal of Supercomputing.

[4]  Doran Wilde,et al.  Regular array synthesis using ALPHA , 1994, Proceedings of IEEE International Conference on Application Specific Array Processors (ASSAP'94).

[5]  Richard M. Karp,et al.  The Organization of Computations for Uniform Recurrence Equations , 1967, JACM.

[6]  Jürgen Teich,et al.  Design Space Exploration for Massively Parallel Processor Arrays , 2001, PaCT.

[7]  Jürgen Teich,et al.  Synthesis of FPGA Implementations from Loop Algorithms , 2001 .

[8]  B. Ramakrishna Rau,et al.  PICO: Automatically Designing Custom Computers , 2002, Computer.

[9]  Scott A. Mahlke,et al.  High-level synthesis of nonprogrammable hardware accelerators , 2000, Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors.

[10]  Lothar Thiele,et al.  Resource constrained scheduling of uniform algorithms , 1993, J. VLSI Signal Process..

[11]  Rajesh Gupta,et al.  Loop shifting and compaction for the high-level synthesis of designs with complex control flow , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[12]  Jürgen Teich,et al.  Regular mapping for coarse-grained reconfigurable architectures , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Paul Feautrier,et al.  Automatic Parallelization in the Polytope Model , 1996, The Data Parallel Programming Model.

[14]  Uwe Eckhardt,et al.  Hierarchical algorithm partitioning at system level for an improved utilization of memory structures , 1999, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[15]  Ed F. Deprettere,et al.  Compaan: deriving process networks from Matlab for embedded signal processing architectures , 2000, CODES '00.