Efficient control generation for mapping nested loop programs onto processor arrays

Processor array architectures are optimal platforms for computationally intensive applications. Such architectures are characterized by hierarchies of parallelism and memory structures, i.e. processor arrays apart from different levels of cache have a large number of processing elements (PE) where each PE can further contain sub-word parallelism. In order to handle large scale problems, balance local memory requirements with I/O-bandwidth, and use different hierarchies of parallelism and memory, one needs a sophisticated transformation called hierarchical partitioning. Innately the applications are data flow dominant and have almost no control flow, but the application of hierarchical partitioning techniques has the disadvantage of a more complex control flow. In a previous paper, the authors presented first time a methodology for the automated control path synthesis for the mapping of partitioned algorithms onto processor arrays. However, the control path contained complex multiplication and division operators. In this paper, we propose a significant extension to the methodology which reduces the hardware cost of the global controller and memory address generators by avoiding these costly operations.

[1]  Steven Derrien,et al.  Interfacing compiled FPGA programs: the MMAlpha approach , 2000, International Conference on Parallel and Distributed Processing Techniques and Applications.

[2]  Jürgen Teich,et al.  Partitioning Processor Arrays under Resource Constraints , 1997, J. VLSI Signal Process..

[3]  Jürgen Teich,et al.  Scheduling of partitioned regular algorithms on processor arrays with constrained resources , 1996, Proceedings of International Conference on Application Specific Systems, Architectures and Processors: ASAP '96.

[4]  Jingling Xue Formal synthesis of control signals for systolic arrays , 1992 .

[5]  Jürgen Teich,et al.  Hierarchical Partitioning for Piecewise Linear Algorithms , 2006, International Symposium on Parallel Computing in Electrical Engineering (PARELEC'06).

[6]  Vivek Sarkar,et al.  Baring It All to Software: Raw Machines , 1997, Computer.

[7]  Patrice Quinton,et al.  Hardware synthesis for multi-dimensional time , 2003, Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003.

[8]  Jürgen Teich,et al.  Design Space Exploration for Massively Parallel Processor Arrays , 2001, PaCT.

[9]  Christian Lengauer,et al.  Loop Parallelization in the Polytope Model , 1993, CONCUR.

[10]  Michael Wolfe,et al.  High performance compilers for parallel computing , 1995 .

[11]  Jürgen Teich,et al.  Regular mapping for coarse-grained reconfigurable architectures , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  D. Kissler,et al.  Hardware Cost Analysis for Weakly Programmable Processor Arrays , 2006, 2006 International Symposium on System-on-Chip.

[13]  Frédéric Vivien,et al.  Constructing and exploiting linear schedules with prescribed parallelism , 2002, TODE.

[14]  Sanjay V. Rajopadhye,et al.  Generation of Efficient Nested Loops from Polyhedra , 2000, International Journal of Parallel Programming.

[15]  Jürgen Teich,et al.  Controller Synthesis for Mapping Partitioned Programs on Array Architectures , 2006, ARCS.

[16]  Jürgen Teich,et al.  Control generation in the design of processor arrays , 1991, J. VLSI Signal Process..

[17]  Uwe Eckhardt,et al.  Hierarchical algorithm partitioning at system level for an improved utilization of memory structures , 1999, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..